计算机科学
代码本
生成语法
语音识别
人工智能
编码器
和声(颜色)
模式识别(心理学)
操作系统
艺术
视觉艺术
作者
Kai Song,Xia Liang,Junmin Wu
标识
DOI:10.1145/3508546.3508612
摘要
In recent years, accompaniment generation has become one of the hottest topics in deep learning and many researchers work on it. However, most prior works are unstable and lack harmony, which makes the results they generate far from satisfying to human ears. To address this problem, we propose to combine accompaniment generation and music theory. Firstly, according to the characteristic of accompaniment that accompaniment itself has many fixed patterns, we employ VQ-VAE as our generative framework. The component called codebook of VQ-VAE has the capability to learn the common patterns in accompaniment, which coincides with the idea of pattern writing in the music of accompaniment. Moreover, accompaniment is composed of multiple tracks and every track consists of a sequence of musical notes, which means that accompaniment has both temporal and spatial information. So, we introduce ViT as the encoder and decoder of VQ-VAE because of this model's capability of simultaneously capturing time sequence information as well as harmony information between the tracks in the same time slice. Objective and subjective evaluations demonstrate the above features of our work, and the quality of our generated results reach the state-of-the-art level.
科研通智能强力驱动
Strongly Powered by AbleSci AI