计算机科学
人工智能
变压器
面部表情
判别式
模式识别(心理学)
卷积神经网络
计算机视觉
编码器
语音识别
工程类
电压
操作系统
电气工程
作者
Zengqun Zhao,Qingshan Liu
出处
期刊:ACM Multimedia
日期:2021-10-17
被引量:60
标识
DOI:10.1145/3474085.3475292
摘要
This paper proposes a dynamic facial expression recognition transformer (Former-DFER) for the in-the-wild scenario. Specifically, the proposed Former-DFER mainly consists of a convolutional spatial transformer (CS-Former) and a temporal transformer (T-Former). The CS-Former consists of five convolution blocks and N spatial encoders, which is designed to guide the network to learn occlusion and pose-robust facial features from the spatial perspective. And the temporal transformer consists of M temporal encoders, which is designed to allow the network to learn contextual facial features from the temporal perspective. The heatmaps of the leaned facial features demonstrate that the proposed Former-DFER is capable of handling the issues such as occlusion, non-frontal pose, and head motion. And the visualization of the feature distribution shows that the proposed method can learn more discriminative facial features. Moreover, our Former-DFER also achieves state-of-the-art results on the DFEW and AFEW benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI