A dual-modal feature alignment based object detection algorithm is proposed for the full fusion of visible and infrared image features. First, we propose a two stream detection model. The algorithm supports simultaneous input of visible and infrared image pairs. Secondly, a gated fusion network is designed, consisting of a dual-modal feature alignment module and a feature fusion module. Medium-term fusion is used, which will be used as the middle layer of the dual-stream backbone network. In particular, the dual-mode feature alignment module extracts detailed information of the dual-mode aligned features by computing a multi-scale dual-mode aligned feature vector. The feature fusion module recalibrates the bimodal fused features and then multiplies them with the bimodal aligned features to achieve cross-modal fusion with joint enhancement of the lower and higher level features. We validate the performance of the proposed algorithm using both the publicly available KAIST pedestrian dataset and a self-built GIR dataset. On the KAIST dataset, the algorithm achieves an accuracy of 77.1%, which is 17.3% and 5.6% better than the accuracy of the benchmark algorithm YOLOv5-s for detecting visible and infrared images alone; on the self-built GIR dataset, the detection accuracy is 91%, which is 1.2% and 14.2% better than the benchmark algorithm for detecting visible and infrared images alone respectively. And the speed meets the real time requirements.