Gait emotion recognition plays a crucial role in the intelligent system. Most existing approaches identify emotions by focusing on local actions over time. However, some valuable observational facts that the effective distances of different emotions in the time domain are different, and the local actions during walking are quite similar, are put aside in those methods. And this ignorance often ends up impairing performance of emotion recognition. To address the issues, a novel model, named MSA-GCN (MultiScale Adaptive Graph Convolution Network), is proposed to utilize the valuable observational knowledge for improving emotion recognition performance. In the proposed model, an adaptive spatio-temporal graph convolution is designed to dynamically select convolution kernels to learn the spatio-temporal features of different emotions. Moreover, a Cross-Scale Mapping Interaction mechanism (CSMI) is proposed to construct an adaptive adjacency matrix for high-quality aggregation of the multiscale information. Extensive experimental results on public datasets indicate that, compared with the state-of-the-art methods, the proposed approach achieves better performance in terms of emotion recognition accuracy, and shows the proposed approach is promising.