抄写(语言学)
计算机科学
前提
模式
代表(政治)
人工智能
语音识别
机器学习
政治学
社会科学
语言学
政治
哲学
社会学
法学
作者
Carlos Manuel Reglero de la Fuente,Jose J. Valero-Mas,F. Xavier Castellanos,Jorge Calvo-Zaragoza
标识
DOI:10.1007/s13735-021-00221-6
摘要
Abstract Optical Music Recognition (OMR) and Automatic Music Transcription (AMT) stand for the research fields that aim at obtaining a structured digital representation from sheet music images and acoustic recordings, respectively. While these fields have traditionally evolved independently, the fact that both tasks may share the same output representation poses the question of whether they could be combined in a synergistic manner to exploit the individual transcription advantages depicted by each modality. To evaluate this hypothesis, this paper presents a multimodal framework that combines the predictions from two neural end-to-end OMR and AMT systems by considering a local alignment approach. We assess several experimental scenarios with monophonic music pieces to evaluate our approach under different conditions of the individual transcription systems. In general, the multimodal framework clearly outperforms the single recognition modalities, attaining a relative improvement close to $$40\%$$ 40 % in the best case. Our initial premise is, therefore, validated, thus opening avenues for further research in multimodal OMR-AMT transcription.
科研通智能强力驱动
Strongly Powered by AbleSci AI