Voxel-Grid Based Deep Learning for Robust People Counting and Tracking with Event-Based Vision Sensors


Alahmad R., Zhou Z., Albaroudi M., Alraee A., Alraie H., Yasukawa S.

31st International Conference on Artificial Life and Robotics, ICAROB 2026, Oita, Japonya, 29 Ocak - 01 Şubat 2026, ss.173-178, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Oita
  • Basıldığı Ülke: Japonya
  • Sayfa Sayıları: ss.173-178
  • Anahtar Kelimeler: Deep learning, Event-based vision, People counting, Real-time tracking, Voxel-grid
  • Orta Doğu Teknik Üniversitesi Kuzey Kıbrıs Kampüsü Adresli: Evet

Özet

Conventional frame-based vision systems for people counting often fail in environments with high-speed motion, extreme lighting conditions, or strict privacy requirements. Event-based vision sensors (EVS) offer a promising alternative by asynchronously capturing pixel-level brightness changes with microsecond latency and a high dynamic range. However, the sparse and asynchronous nature of event data necessitates specialized processing architectures. This study proposes an end-to-end, fully event-driven pipeline for robust people counting. A Voxel-Grid representation was utilized to convert raw event streams into structured tensors that preserve temporal dynamics. A lightweightsliding-window Convolutional Neural Network CNN wasthen employed for real-time patch classification, coupled with a ByteTrack-style association method to ensure stable trajectory maintenance. To overcome the high cost of manual annotation in event-based vision, we introduced an automated ground-truth generation method based on center-point expansion and a specialized evaluation metric with temporal tolerance. Experimental results demonstrate that the proposed system achieves stable localization and tracking in real-world scenarios, making it suitable for privacy-preserving monitoring on edge-computing devices.