Optical Music Recognition (OMR) is an important way to digitize score images and has broad application prospects in fields such as the storage of music documents, music education and digital creation. As a new paradigm for object detection, DETR (detection transformer) has the ability to associate contextual information, which can be exploited to resolve the OMR task. However, the original DETR does not fit OMR well due to its high computational complexity and numerous parameters. To address the DETR defects and improve the recognition accuracy of OMR, we propose a novel multi-scale DETR (M-DETR) with a multi-scale feature fusion mechanism and improved attention mechanisms. First, a new multi-scale feature fusion mechanism is designed to let the backbone network of M-DETR get rich multi-scale information. Then, a key-region attention mechanism is incorporated based on the character that the key information is concentrated on a score image. Finally, the pre-context attention mechanism is introduced to make better use of the contextual association between recognition notes in music scores. Experiment results show that M-DETR achieves recognition accuracy of 90.6% for 7 typical small-sized notes, which is better than Faster R-CNN and YOLO v5, and the improvement rate is 10.02% compared to the original DETR algorithm. The results indicate that M-DETR is an effective way for the OMR task, which also provides a new solution for the detection of small-sized objects with contextual association.