One of the most prevalent brain-computer interface (BCI) paradigms is the Electroencephalogram (EEG) motor imagery (MI). It has found extensive applications in numerous fields. While there have been significant strides in achieving high MI classification performance, certain challenges still persist:Effective utilization of the time-varying spatial and temporal features from multi-channel brain signals remains elusive. Fully leveraging the interactive information embedded within finite-length MI-EEG samples is still an open question. In this study, we introduce the Deep-Shallow Attention-Based Multi-Frame Fusion Network (DSA-MFNet) tailored for EEG-based motor imagery classification. The architecture of DSA-MFNet encompasses both a Deep-Shallow Attention (DSA) module and a Multi-Frame Fusion (MF) module. In detail, the DSA module integrates a deep-shallow convolution block, which extracts both intricate deep spatial-temporal features and surface-level features. Subsequently, an attention block emphasizes the most salient features in MI-EEG data. Meanwhile, the MF module delves into the interactions amongst multiple frames in the MI-EEG data, shedding light on the unique characteristics of time-varying EEG signals.Our model sets a new benchmark by outperforming the leading techniques, achieving an accuracy of 86.6% on the BCI Competition IV-2a dataset for subject-dependent modes. For the sake of transparency and to foster further research, we will be making our code and trained models available on GitHub.