In recent years, accompaniment generation has become one of the hottest topics in deep learning and many researchers work on it. However, most prior works are unstable and lack harmony, which makes the results they generate far from satisfying to human ears. To address this problem, we propose to combine accompaniment generation and music theory. Firstly, according to the characteristic of accompaniment that accompaniment itself has many fixed patterns, we employ VQ-VAE as our generative framework. The component called codebook of VQ-VAE has the capability to learn the common patterns in accompaniment, which coincides with the idea of pattern writing in the music of accompaniment. Moreover, accompaniment is composed of multiple tracks and every track consists of a sequence of musical notes, which means that accompaniment has both temporal and spatial information. So, we introduce ViT as the encoder and decoder of VQ-VAE because of this model's capability of simultaneously capturing time sequence information as well as harmony information between the tracks in the same time slice. Objective and subjective evaluations demonstrate the above features of our work, and the quality of our generated results reach the state-of-the-art level.