Abstract:
To address the challenge of fast and accurate detection of small infrared pedestrian targets at inclined viewing angles, a lightweight real-time detection network model for small infrared pedestrian targets (DRA-YOLO) was proposed. First, K-means++ anchor box clustering was utilized to adapt to targets of different size scales, thereby accelerating network convergence and improving detection accuracy. Second, different attention mechanisms were integrated into the redesigned feature extraction network to enhance feature location and computational efficiency. This was coupled with an improved feature pyramid structure to extract key features and enhance model stability. Finally, the neck was redesigned by eliminating down-sampling and reorganizing it with SimAM to form a new feature fusion structure. Moreover, the detection head was redesigned to suit the dataset used in this study. Comparative experiments showed that, relative to the original YOLOv5s model, the proposed method performed excellently on both self-made and public datasets. The mAP50 reached 94.5%, detection speed improved by 20.8%, model size was compressed to 10.1 MB (30.3% reduction), and GFLOPs decreased by 29.1%. These improvements facilitated the accurate and rapid detection of targets, effectively balancing model size, detection accuracy, and inference speed.