定位
人工智能
计算机科学
表达式(计算机科学)
宏
模式识别(心理学)
元组
特征(语言学)
自然语言处理
数学
程序设计语言
语言学
离散数学
哲学
作者
Wang-Wang Yu,Jingwen Jiang,Kai-Fu Yang,Hongmei Yan,Yongjie Li
出处
期刊:IEEE Transactions on Affective Computing
[Institute of Electrical and Electronics Engineers]
日期:2023-04-13
卷期号:: 1-18
被引量:4
标识
DOI:10.1109/taffc.2023.3266808
摘要
Micro- and macro-expression spotting in an untrimmed video is a challenging task, due to the mass generation of false positive samples. Most existing methods localize higher response areas by extracting hand-crafted features or cropping specific regions from all or some key raw images. However, these methods either neglect the continuous temporal information or model the inherent human motion paradigms (background) as foreground. Consequently, we propose a novel two-stream network, named Local suppression and Global enhancement Spotting Network (LGSNet), which takes segment-level features from optical flow and videos as input. LGSNet adopts anchors to encode expression intervals and selects the encoded deviations as the object of optimization. Furthermore, we introduce a Temporal Multi-Receptive Field Feature Fusion Module (TMRF $^{3}$ M) and a Local Suppression and Global Enhancement Module (LSGEM), which help spot short intervals more precisely and suppress background information. To further highlight the differences between positive and negative samples, we set up a large number of random pseudo ground truth intervals (background clips) on some discarded sliding windows to accomplish background clips modeling to counteract the effect of non-expressive face and head movements. Experimental results show that our proposed network achieves state-of-the-art performance on the CAS(ME) $^{2}$ , CAS(ME) $^{3}$ and SAMM-LV datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI