Tao Wang,Shuang Liu,Feng He,Weina Dai,Minghao Du,Yufeng Ke,Dong Ming
出处
期刊:IEEE Transactions on Affective Computing [Institute of Electrical and Electronics Engineers] 日期:2023-08-15卷期号:: 1-15被引量:2
标识
DOI:10.1109/taffc.2023.3305197
摘要
Body motion is an important channel for human communication and plays a crucial role in automatic emotion recognition. This work proposes a multiscale spatio-temporal network, which captures the coarse-grained and fine-grained affective information conveyed by full-body motion and decodes the complex mapping between emotion and body movement. The proposed method consists of three main components. First, a scale selection algorithm based on the pseudo-energy model is presented, which guides our network to focus not only on long-term macroscopic body expressions, but also on short-term subtle posture changes. Second, we propose a hierarchical spatio-temporal network that can jointly process posture covariance matrices and 3D posture images with different time scales, and then hierarchically fuse them in a coarse-to-fine manner. Finally, a spatio-temporal iterative (ST-ITE) fusion algorithm is developed to jointly optimize the proposed network. The proposed approach is evaluated on five public datasets. The experimental results show that the introduction of the energy-based scale selection algorithm significantly enhances the learning capability of the network. The proposed ST-ITE fusion algorithm improves the generalization and convergence of our model. The average classification results of the proposed method exceed 86% on all datasets and outperform the state-of-the-art methods.