条件随机场
计算机科学
人工智能
群(周期表)
领域(数学)
模式识别(心理学)
数学
有机化学
化学
纯数学
作者
Rizard Renanda Adhi Pramono,Wen‐Hsien Fang,Yie‐Tarng Chen
出处
期刊:IEEE transactions on image processing
[Institute of Electrical and Electronics Engineers]
日期:2021-01-01
卷期号:30: 8184-8199
被引量:12
标识
DOI:10.1109/tip.2021.3113570
摘要
This paper presents a new relational network for group activity recognition. The essence of the network is to integrate conditional random fields (CRFs) with self-attention to infer the temporal dependencies and spatial relationships of the actors. This combination can take advantage of the capability of CRFs in modelling the actors' features that depend on each other and the capability of self-attention in learning the temporal evolution and spatial relational contexts of every actor in videos. Additionally, there are two distinct facets of our CRF and self-attention. First, the pairwise energy of the new CRF relies on both of the temporal self-attention and spatial self-attention, which apply the self-attention mechanism to the features in time and space, respectively. Second, to address both local and non-local relationships in group activities, the spatial self-attention takes account of a collection of cliques with different scales of spatial locality. The associated mean-field inference thereafter can thus be reformulated as a self-attention network to generate the relational contexts of the actors and their individual action labels. Lastly, a bidirectional universal transformer encoder (UTE) is utilized to aggregate the forward and backward temporal context information, scene information and relational contexts for group activity recognition. A new loss function is also employed, consisting of not only the cost for the classification of individual actions and group activities, but also a contrastive loss to address the miscellaneous relational contexts between actors. Simulations show that the new approach can surpass previous works on four commonly used datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI