对偶(语法数字)
计算机科学
路径(计算)
正规化(语言学)
表达式(计算机科学)
特征学习
代表(政治)
帧(网络)
人工智能
模式识别(心理学)
艺术
电信
政治
文学类
程序设计语言
法学
政治学
作者
Yan Wang,Yixuan Sun,Wei Song,Shuyong Gao,Yi‐Wen Huang,Zhaoyu Chen,Weifeng Ge,Wenqiang Zhang
标识
DOI:10.1145/3503161.3547865
摘要
Current works of facial expression learning in video consume significant computational resources to learn spatial channel feature representations and temporal relationships. To mitigate this issue, we propose a Dual Path multi-excitation Collaborative Network (DPCNet) to learn the critical information for facial expression representation from fewer keyframes in videos. Specifically, the DPCNet learns the important regions and keyframes from a tuple of four view-grouped frames by multi-excitation modules and produces dual-path representations of one video with consistency under two regularization strategies. A spatial-frame excitation module and a channel-temporal aggregation module are introduced consecutively to learn spatial-frame representation and generate complementary channel-temporal aggregation, respectively. Moreover, we design a multi-frame regularization loss to enforce the representation of multiple frames in the dual view to be semantically coherent. To obtain consistent prediction probabilities from the dual path, we further propose a dual path regularization loss, aiming to minimize the divergence between the distributions of two-path embeddings. Extensive experiments and ablation studies show that the DPCNet can significantly improve the performance of video-based FER and achieve state-of-the-art results on the large-scale DFEW dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI