Benchmarking control strategies for UAV swarms: Centralized, decentralized, or federated reinforcement learning


Ali M. A., Maqsood A., Athar U., Ali S.

AEROSPACE SCIENCE AND TECHNOLOGY, vol.170, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Publication Type: Article / Article
  • Volume: 170
  • Publication Date: 2026
  • Doi Number: 10.1016/j.ast.2025.111539
  • Journal Name: AEROSPACE SCIENCE AND TECHNOLOGY
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
  • Middle East Technical University Northern Cyprus Campus Affiliated: Yes

Abstract

Autonomous UAV swarms are becoming increasingly important in mission-critical domains where coordination depends on training approaches that enable effective cooperation among agents. This work presents an evaluation of three dominant reinforcement learning paradigms, centralized, decentralized, and federated, within the context of cooperative UAV swarm training. We develop a unified experimental approach and benchmark these strategies across standardized environments and tasks, including target search and coordinated navigation. This evaluation covers eight critical metrics, including total training time, convergence rate, sample efficiency, training stability, final reward achieved, policy generalization, transferability, and scalability. The results show that centralized learning converges fastest (1,800 +/- 82 epochs) and achieves the highest final reward (26.31) with the best sample efficiency (2.46 reward/1k interactions), but suffers from poor scalability (75.7% drop in reward from 8-to 36-agent swarm) and lower transferability (64.9 %). Decentralized learning generalizes better (81.0% transferability) and scales more effectively (64.3 % drop) but converges slowest (2,400 +/- 96 epochs) with the lowest sample efficiency (1.78). Federated learning provides a strong middle ground, combining a good convergence speed (1,950 +/- 89 epochs) and high transferability (88.6 %) with the best scalability (55.6% drop) while maintaining comparable final reward (25.48). These findings provide practical guidelines for selecting appropriate learning architectures in real-world swarm deployment scenarios and set the groundwork for future research on hybrid and adaptive reinforcement learning strategies.