判别式
计算机科学
人工智能
模式识别(心理学)
规范化(社会学)
估计员
姿势
表达式(计算机科学)
面部表情
不变(物理)
机器学习
数学
社会学
人类学
数学物理
程序设计语言
统计
作者
Yuanyuan Liu,Jiyao Peng,Wei Dai,Jiabei Zeng,Shiguang Shan
标识
DOI:10.1016/j.patcog.2023.109496
摘要
Multi-view facial expression recognition (FER) is a challenging task because the appearance of an expression varies greatly due to poses. To alleviate the influences of poses, recently developed methods perform pose normalization, learn pose-invariant features, or learn pose-specific FER classifiers. However, these methods usually rely on a prerequisite pose estimator or expressive region detector that is independent of the subsequent expression analysis. Different from existing methods, we propose a joint spatial and scale attention network (SSA-Net) to localize proper regions for simultaneous head pose estimation (HPE) and FER. Specifically, SSA-Net discovers the regions most relevant to the facial expression at hierarchical scales by a spatial attention mechanism, and the most informative scales are selected in a scale attention learning manner to learn the joint pose-invariant and expression-discriminative representations. Then, we employ a dynamically constrained multi-task learning mechanism with a delicately designed constrain regulation to properly and adaptively train the network to optimize the representations, thus achieving accurate multi-view FER. The effectiveness of the proposed SSA-Net is validated on three multi-view datasets (BU-3DFE, Multi-PIE, and KDEF) and three in-the-wild FER datasets (AffectNet, SFEW, and FER2013). Extensive experiments demonstrate that the proposed framework outperforms existing state-of-the-art methods under both within-dataset and cross-dataset settings, with relative accuracy gains of 2.36%, 1.33%, 3.11%, 2.84%, 15.7%, and 7.57%, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI