Novel application of reinforcement learning for adaptive user clustering in LEO NTN systems: comparative analysis with traditional benchmarks

Ahmad, BILAL

doi:10.1080/24751839.2025.2599571

Novel application of reinforcement learning for adaptive user clustering in LEO NTN systems: comparative analysis with traditional benchmarks

Ahmad B.

JOURNAL OF INFORMATION AND TELECOMMUNICATION, cilt.9, sa.4, ss.1-25, 2025 (ESCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 9 Sayı: 4
Basım Tarihi: 2025
Doi Numarası: 10.1080/24751839.2025.2599571
Dergi Adı: JOURNAL OF INFORMATION AND TELECOMMUNICATION
Derginin Tarandığı İndeksler: Scopus, Emerging Sources Citation Index (ESCI)
Sayfa Sayıları: ss.1-25
Orta Doğu Teknik Üniversitesi Kuzey Kıbrıs Kampüsü Adresli: Evet

Özet

The demand for global connectivity is driving the development of low Earth orbit (LEO) Non-Terrestrial Network (NTN) systems, which offer low latency but face dynamic channel conditions. Efficient user scheduling plays a pivotal role in multi-user multiple-input multiple-output (MU-MIMO) downlink systems, as it enables the maximization of spatial multiplexing gains while effectively mitigating inter-user interference. This user selection problem is NP-hard, particularly as the number of users vastly exceeds the number of available antennas on the LEO satellite, making an exhaustive search for the optimal cluster computationally intractable. Conventional scheduling methods, such as graph-based or heuristic approaches, are hindered by high computational complexity and poor adaptability to network dynamics. This paper introduces a novel reinforcement learning (RL) framework for dynamic user scheduling. The approach utilizes proximal policy optimization (PPO) and soft actor-critic (SAC) to optimize cluster assignments and sizes, balancing throughput, fairness (modeled via SINR variance minimization), and cluster count. An SINR-based initialization enhances learning efficiency. Simulations demonstrate that SAC achieves superior throughput, while PPO excels in fairness and operational efficiency. Both significantly outperform baseline methods (K-means, K-means without SINR initialization, and Random-based), providing a scalable and adaptive solution for next-generation non-terrestrial networks.