Conveyor belt tearing real-time detection is essential for industrial transportation under multi-source interference environment. Here, a deep learning-based visual detection method named YOLOv4-BELT was proposed. The multi-condition belt tearing images dataset (MBTID) is firstly produced. Afterwards, the MBTID is pre-processed by the improved Cutmix algorithm for data augmentation, which aims to enrich image background and reduce over-fitting. Next, the deep convolutional neural network CSPDarknet53 is employed for multi-scale tear features extracting and fusion, which can effectively improve the recognition capability towards complex samples. Moreover, the training performance is significantly enhanced via a proper designed multi-stage transfer training strategy. Ultimately, the previous deep-level tear features are further utilized to classification and localization tasks. The results show that the precision, accuracy, recall and F1 score of YOLOv4-BELT are 96.6%, 99.1%, 98.1% and 97.4% respectively. The detection speed reaches 21.1FPS, which significantly improves the detection accuracy and robustness compared with the state-of-the-art methods.