YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries

Aquino Joctum; John Kandiri

doi:https://doi.org/10.14445/22312803/IJCTT-V73I6P108

Research Article | Open Access | Download PDF

Volume 73 | Issue 6 | Year 2025 | Article Id. IJCTT-V73I6P108 | DOI : https://doi.org/10.14445/22312803/IJCTT-V73I6P108

YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries

Aquino Joctum, John Kandiri

Received	Revised	Accepted	Published
29 Apr 2025	31 May 2025	17 Jun 2025	29 Jun 2025

Citation :

Aquino Joctum, John Kandiri, "YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries," International Journal of Computer Trends and Technology (IJCTT), vol. 73, no. 6, pp. 58-74, 2025. Crossref, https://doi.org/10.14445/22312803/IJCTT-V73I6P108

Abstract

Autonomous vehicle perception systems require robust pedestrian detection, particularly on geometrically complex roadways like Type-S curved surfaces, where standard RGB camera-based methods face limitations. This paper introduces YOLO-APD, a novel deep learning architecture enhancing the YOLOv8 framework specifically for this challenge. YOLO-APD integrates several key architectural modifications: a parameter-free SimAM attention mechanism, computationally efficient C3Ghost modules, a novel SimSPPF module for enhanced multi-scale feature pooling, the Mish activation function for improved optimization, and an Intelligent Gather & Distribute (IGD) module for superior feature fusion in the network's neck. The concept of leveraging vehicle steering dynamics for adaptive region-of-interest processing is also presented. Comprehensive evaluations on a custom CARLA dataset simulating complex scenarios demonstrate that YOLO-APD achieves state-of-the-art detection accuracy, reaching 77.7% mAP@0.5:0.95 and exceptional pedestrian recall exceeding 96%, significantly outperforming baseline models, including YOLOv8. Furthermore, it maintains real-time processing capabilities at 100 FPS, showcasing a superior balance between accuracy and efficiency. Ablation studies validate the synergistic contribution of each integrated component. Evaluation on the KITTI dataset confirms the architecture's potential while highlighting the need for domain adaptation. This research advances the development of highly accurate, efficient, and adaptable perception systems based on cost-effective sensors, contributing to enhanced safety and reliability for autonomous navigation in challenging, less-structured driving environments.

Keywords

Autonomous vehicles, Computer vision, Deep learning, Object detection, Pedestrian.

References

[1] M. Hassaballah et al., “Vehicle Detection and Tracking in Adverse Weather Using a Deep Learning Framework,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 7, pp. 4230–4242, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Sumit Ranjan, and S. Senthamilarasu, Applied Deep Learning and Computer Vision for Self-Driving Cars, Packt Publishing, 2020.
[Google Scholar] [Publisher Link]
[3] Alireza Razzaghi et al., “World Health Organization’s Estimates of Death Related to Road Traffic Crashes and Their Discrepancy with Other Countries’ National Report,” Journal of Injury and Violence Research, vol. 12, no. 3, pp. 39-44, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Jin Qiu, Jian Liu, and Yunyi Shen, “Computer Vision Technology Based on Deep Learning,” 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence ICIBA, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Yi Cao, Yuning Wang, and Huijie Fan, “Improved YOLOv5s Network for Traffic Object Detection with Complex Road Scenes,” 2023 IEEE 13th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Johannes Deichmann, Autonomous Driving’s Future: Convenient and Connected, McKinsey & Company, 2023.
[Google Scholar] [Publisher Link]
[7] Gamze Akyol et al., “Deep Learning Based, Real-Time Object Detection for Autonomous Driving,” 2020 28th Signal Processing and Communications Applications Conference (SIU), 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Ali Ziryawulawo et al., “An Integrated Deep Learning-based Lane Departure Warning and Blind Spot Detection System: A Case Study for the Kayoola Buses,” 2023 1st International Conference on the Advancements of Artificial Intelligence in African Context, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Ramin Sahba, Amin Sahba, and Farshid Sahba, “Using a Combination of LiDAR, RADAR, and Image Data for 3D Object Detection in Autonomous Vehicles,” 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Xiangmo Zhao et al., “Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications,” IEEE Sensors Journal, vol. 20, no. 9, pp. 4901–4913, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Fan Bu et al., “Pedestrian Planar LiDAR Pose (PPLP) Network for Oriented Pedestrian Detection Based on Planar LiDAR and Monocular Images,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1626–1633, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Michal Uřičář et al., “VisibilityNet: Camera Visibility Detection and Image Restoration for Autonomous Driving,” Electronic Imaging, vol. 32, pp. 79–1–79–8, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Richard Szeliski, Computer Vision: Algorithms and Applications, Second Edition, Springer, 2022.
[Google Scholar] [Publisher Link]
[14] Hao Zhang, and Shuaijie Zhang, “Focaler-IoU: More Focused Intersection over Union Loss,” arXiv Preprint, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Fred Hasselman, and Anna M.T. Bosman, “Studying Complex Adaptive Systems with Internal States: A Recurrence Network Approach to the Analysis of Multivariate Time-series Data Representing Self-reports of Human Experience,” Frontiers in Applied Mathematics and Statistics, vol. 6, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Kaiming He et al., “Mask R-CNN,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Ross Girshick, “Fast R-CNN,” 2015 IEEE International Conference on Computer Vision (ICCV), 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Shaoqing Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Bilel Tarchoun et al., “Deep CNN-based Pedestrian Detection for Intelligent Infrastructure,” 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Yunchuan Wu, Cheng Chen, and Bo Wang, “Pedestrian Detection Based on Improved SSD Object Detection Algorithm,” 2022 International Conference on Networking and Network Applications (NaNA), 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Wei Liu et al., “SSD: Single Shot MultiBox Detector,” Computer Vision-ECCV 2016, pp. 21–37, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Joseph Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Juan Terven, Diana-Margarita Cordova-Esparza, and Julio-Alejandro Romero-Gonzalez, “A Comprehensive Review of YOLO: From YOLOv1 to YOLOv8 and Beyond,” Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680-1716, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Joseph Redmon, and Ali Farhadi, “YOLOv3: An Incremental Improvement,” arXiv Preprint, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv Preprint, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[27] G. Jocher and others, “YOLOv5 by Ultralytics. 2020,” 2023.
[Google Scholar]
[28] Rahima Khanam, and Muhammad Hussain, “What is YOLOv5: A Deep Look into the Internal Features of the Popular Object Detector,” arXiv Preprint, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Chuyi Li et al., “YOLOv6 v3.0: A Full-Scale Reloading,” arXiv Preprint, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao, “YOLOv7: Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Muhammad Yaseen, “What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector,” arXiv Preprint, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Fatma Betul Kara Ardaç, and Pakize Erdogmuş, “Car Object Detection: Comparative Analysis of YOLOv9 and YOLOv10 Models,” 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Priyanto Hidayatullah et al., “YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review,” arXiv preprint, 2025.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Rejin Varghese, and M. Sambath, “YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness,” 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems, ADICS 2024, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Jie Hu et al., “Squeeze-and-Excitation Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 8, pp. 2011–2023, Sep. 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Qilong Wang et al., “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[37] Qibin Hou, Daquan Zhou, and Jiashi Feng, “Coordinate Attention for Efficient Mobile Network Design,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Lingxiao Yang et al., “SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks,” Proceedings of the 38th International Conference on Machine Learning, pp. 11863–11874, 2021.
[Google Scholar] [Publisher Link]
[39] Wei Li et al., “Object Detection based on an Adaptive Attention Mechanism,” Scientific Reports, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[40] Guoxin Shen, Xuerong Li, and Yi Wei, “Improved Algorithm for Pedestrian Detection of Lane Line based on YOLOv5s Model,” 2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[41] Kai Han et al., “GhostNet: More Features from Cheap Operations,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1577-1586, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[42] Shu Liu et al., “Path Aggregation Network for Instance Segmentation,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[43] Mingxing Tan, Ruoming Pang, and Quoc V. Le, “EfficientDet: Scalable and Efficient Object Detection,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[44] Diganta Misra, “Mish: A Self Regularized Non-Monotonic Activation Function,” arXiv preprint, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[45] Yinpeng Chen et al., “Dynamic Convolution: Attention Over Convolution Kernels,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[46] Chao Li, Aojun Zhou, and Anbang Yao, “Omni-Dimensional Dynamic Convolution,” arXiv Prepring, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[47] Lile Huo et al., “Overview of Pedestrian Detection based on Infrared Image,” 2022 41st Chinese Control Conference (CCC), 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[48] Yu Song et al., “Full-Time Infrared Feature Pedestrian Detection Based on CSP Network,” 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[49] Timothe Verstraete, and Naveed Muhammad, “Pedestrian Collision Avoidance in Autonomous Vehicles: A Review,” Computers, vol. 13, no. 3, p. 78, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[50] Joel Janai et al., “Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art,” Foundations and Trends® in Computer Graphics and Vision, vol. 12, no. 1–3, pp. 1–308, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[51] Holger Caesar et al., “nuScenes: A Multimodal Dataset for Autonomous Driving,” Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621-11631, 2020.
[Google Scholar] [Publisher Link]
[52] Gemb Kaljavesi et al., “CARLA-Autoware-Bridge: Facilitating Autonomous Driving Research with a Unified Framework for Simulation and Module Development,” 2024 IEEE Intelligent Vehicles Symposium, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[53] Peiyu Yang et al., “A Part-Aware Multi-Scale Fully Convolutional Network for Pedestrian Detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 2, pp. 1125–1137, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[54] Rupshali Dasgupta, Yuvraj Sinha Chowdhury, and Sarita Nanda, “Performance Comparison of Benchmark Activation Function ReLU, Swish and Mish for Facial Mask Detection Using Convolutional Neural Network,” Intelligent Systems, pp. 355–367, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[55] Kaiming He et al., “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Learning, vol. 37, no. 9, pp. 1904-1916, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[56] Yichen Zhang et al., “A New Architecture of Feature Pyramid Network for Object Detection,” 2020 IEEE 6th International Conference on Computer and Communications (ICCC), 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[57] Chengcheng Wang et al., “Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism,” Advances in Neural Information Processing Systems, 2023.
[Google Scholar] [Publisher Link]
[58] Ken Arioka, and Yuichi Sawada, “Improved Kalman Filter and Matching Strategy for Multi-Object Tracking System,” 2023 62nd Annual Conference of the Society of Instrument and Control Engineers (SICE), 2023.
[CrossRef] [Google Scholar] [Publisher Link]