AEROSPACE SCIENCE AND TECHNOLOGY, cilt.170, 2026 (SCI-Expanded, Scopus)
Autonomous UAV swarms are becoming increasingly important in mission-critical domains where coordination depends on training approaches that enable effective cooperation among agents. This work presents an evaluation of three dominant reinforcement learning paradigms, centralized, decentralized, and federated, within the context of cooperative UAV swarm training. We develop a unified experimental approach and benchmark these strategies across standardized environments and tasks, including target search and coordinated navigation. This evaluation covers eight critical metrics, including total training time, convergence rate, sample efficiency, training stability, final reward achieved, policy generalization, transferability, and scalability. The results show that centralized learning converges fastest (1,800 +/- 82 epochs) and achieves the highest final reward (26.31) with the best sample efficiency (2.46 reward/1k interactions), but suffers from poor scalability (75.7% drop in reward from 8-to 36-agent swarm) and lower transferability (64.9 %). Decentralized learning generalizes better (81.0% transferability) and scales more effectively (64.3 % drop) but converges slowest (2,400 +/- 96 epochs) with the lowest sample efficiency (1.78). Federated learning provides a strong middle ground, combining a good convergence speed (1,950 +/- 89 epochs) and high transferability (88.6 %) with the best scalability (55.6% drop) while maintaining comparable final reward (25.48). These findings provide practical guidelines for selecting appropriate learning architectures in real-world swarm deployment scenarios and set the groundwork for future research on hybrid and adaptive reinforcement learning strategies.