Abstract:Infrared and visible images have good complementary characteristics, and the fusion of these two modal images can be used to meet the requirements of high accuracy and high robustness of target detection in automatic driving and other fields. The existing multimodal object detection algorithms often have large models and long reasoning time, which cannot be deployed on edge devices, and the direct fusion method cannot give full play to the advantages of different modalities. Therefore, we propose a fusion object detection algorithm based on gradient operator and attention mechanism. The gradient operator was introduced to design a customized convolution to capture the image texture. The infrared branch introduces coordinate attention to play its advantage of target positioning. The weight generation network is introduced to adaptively weight the features of the two modalities. The algorithm structure is modular and lightweight, which is suitable for deployment on edge devices. Experiments on the dataset show that the mAP@0.50 and mAP@0.5∶0.95 index values are 6.3% and 7.2% higher than the single mode detection of visible light, and 11.3% and 9.8% higher than the infrared. The inference frame rate can reach 22.7, which meets the real-time requirement.