Face2Nodes: Learning facial expression representations with relation-aware dynamic graph convolution networks

计算机科学模式识别（心理学）嵌入人工智能图形卷积神经网络判别式图嵌入特征学习深度学习面部表情卷积（计算机科学）理论计算机科学人工神经网络

作者

Fan Jiang,Qionghao Huang,Xiaoyong Mei,Quanlong Guan,Yaxin Tu,Weiqi Luo,Changqin Huang

出处

期刊：Information Sciences [Elsevier BV]
日期：2023-09-01 卷期号：649: 119640-119640 被引量：5

标识

DOI：10.1016/j.ins.2023.119640

摘要

Deep convolutional neural networks (CNNs) have become the standard model architecture for facial expression recognition (FER). However, CNN-based models struggle to capture the structural correlations between different local regions in a face image. Recent methods based on Vision Transformer (ViT) have been introduced to capture long-range dependencies among local regions. Nonetheless, ViT-based approaches are vulnerable to facial regions unrelated to expressions and may learn redundant correlation representations due to their self-attention mechanism. To address these issues, we propose a novel graph-based model called Face2Nodes, which can flexibly learn the graph representations of facial expressions without requiring additional auxiliary facial information such as landmarks. Our Face2Nodes consists of two key components: a multi-scale feature fusion-based patch embedding and a relation-aware dynamic graph convolution network. The patch embedding method uses a multi-scale feature fusion mechanism to obtain more discriminative graph node features for further graph representation learning. A dynamic graph is constructed using the dilated k-nearest neighbors algorithm, and a relation-aware graph convolution operator is designed to learn the latent informative correlations among different nodes in the graph. Extensive experiment results show that Face2Nodes achieves state-of-the-art performance on several popular in-the-wild FER datasets, with overall accuracies of 91.41%, 91.02%, and 66.69% on the FERPlus, RAF-DB, and AffectNet databases, respectively. Furthermore, we found that CNN-based FER approaches have a more significant performance gap between pre-training and training from scratch than Face2Nodes, demonstrating that our model is more data-efficient than CNN-based approaches.

求助该文献

最长约 10秒，即可获得该文献文件

Face2Nodes: Learning facial expression representations with relation-aware dynamic graph convolution networks

今日热心研友