Benchmarking control strategies for UAV swarms: Centralized, decentralized, or federated reinforcement learning

Ali, Mirza; Maqsood, ADNAN; Athar, Usama; Ali, Sara

doi:10.1016/j.ast.2025.111539

Benchmarking control strategies for UAV swarms: Centralized, decentralized, or federated reinforcement learning

Ali M. A., Maqsood A., Athar U., Ali S.

AEROSPACE SCIENCE AND TECHNOLOGY, cilt.170, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 170
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.ast.2025.111539
Dergi Adı: AEROSPACE SCIENCE AND TECHNOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
Orta Doğu Teknik Üniversitesi Kuzey Kıbrıs Kampüsü Adresli: Evet

Özet

Autonomous UAV swarms are becoming increasingly important in mission-critical domains where coordination depends on training approaches that enable effective cooperation among agents. This work presents an evaluation of three dominant reinforcement learning paradigms, centralized, decentralized, and federated, within the context of cooperative UAV swarm training. We develop a unified experimental approach and benchmark these strategies across standardized environments and tasks, including target search and coordinated navigation. This evaluation covers eight critical metrics, including total training time, convergence rate, sample efficiency, training stability, final reward achieved, policy generalization, transferability, and scalability. The results show that centralized learning converges fastest (1,800 +/- 82 epochs) and achieves the highest final reward (26.31) with the best sample efficiency (2.46 reward/1k interactions), but suffers from poor scalability (75.7% drop in reward from 8-to 36-agent swarm) and lower transferability (64.9 %). Decentralized learning generalizes better (81.0% transferability) and scales more effectively (64.3 % drop) but converges slowest (2,400 +/- 96 epochs) with the lowest sample efficiency (1.78). Federated learning provides a strong middle ground, combining a good convergence speed (1,950 +/- 89 epochs) and high transferability (88.6 %) with the best scalability (55.6% drop) while maintaining comparable final reward (25.48). These findings provide practical guidelines for selecting appropriate learning architectures in real-world swarm deployment scenarios and set the groundwork for future research on hybrid and adaptive reinforcement learning strategies.