Single-mode recognition method remains a difficulty problem in target detection and recognition of road vehicle targets in complex urban situations. Hence, using the advantages of obtaining different feature information from infrared and visible images in different situations is considered. We propose a feature level infrared and visible image fusion target detection method based on deep learning. This method first obtains the registered infrared visible image, extracts the image features respectively through two main feature extraction networks, passes through the feature fusion layer, passes into the feature pyramid network to obtain the effective feature layer, and then carries out classification prediction and regression prediction. On the test set, the mAP of the fusion method is 0.89, which is higher than that using only visible images (the mAP is 0.82) and only infrared images (the mAP is 0.79) on the same test set. At the same time, in the night environment, the mAP of the fusion method is much higher than other deep learning frameworks. The experimental results show that the infrared and visible image fusion target detection method realized in this paper has certain advantages over the traditional methods and has a good application prospect.