代码段
计算机科学
变压器
人工智能
表达式(计算机科学)
面部表情
语音识别
特征(语言学)
计算机视觉
模式识别(心理学)
情报检索
量子力学
物理
哲学
语言学
电压
程序设计语言
作者
Yuanyuan Liu,Wenbin Wang,Chuanxu Feng,Haoyu Zhang,Zhe Chen,Yibing Zhan
标识
DOI:10.1016/j.patcog.2023.109368
摘要
Although Transformer can be powerful for modeling visual relations and describing complicated patterns, it could still perform unsatisfactorily for video-based facial expression recognition, since the expression movements in a video can be too small to reflect meaningful spatial-temporal relations. To this end, we propose to decompose the modeling of expression movements of a video into the modeling of a series of expression snippets, each of which contains a few frames, and then boost the Transformer’s ability for intra-snippet and inter-snippet visual modeling, respectively, obtaining the Expression snippet Transformer (EST). For intra-snippet modeling, we devise an attention-augmented snippet feature extractor to enhance the encoding of subtle facial movements of each snippet. For inter-snippet modeling, we introduce a shuffled snippet order prediction head and a corresponding loss to improve the modeling of subtle motion changes across subsequent snippets. The EST obtains state-of-the-art performance, demonstrating its superiority to other CNN-based methods. Our code and the trained model are available at https://github.com/DreamMr/EST
科研通智能强力驱动
Strongly Powered by AbleSci AI