计算机科学
变压器
剪辑
关系(数据库)
图形
人工智能
编码(集合论)
建筑
计算机视觉
理论计算机科学
数据挖掘
程序设计语言
物理
集合(抽象数据类型)
量子力学
电压
艺术
视觉艺术
作者
H. Wang,Y Hu,Yangfu Zhu,Jinsheng Qi,Bin Wu
标识
DOI:10.1145/3581783.3612175
摘要
Social Relation Recognition is an important part of Video Understanding, providing insights into the information that videos convey. Most previous works mainly focused on graph generation for characters, instead of edges which are more suitable for relation modelling. Furthermore, previous methods tend to recognize social relations for single frames or short video clips within their receptive fields, neglecting the importance of continuous reasoning throughout the entire video. To tackle these challenges, we propose a novel Shifted GCN-GAT and Cumulative-Transformer framework, named SGCAT-CT. The overall architecture consists of an SGCAT module for shifted graph operations on novel relation graphs and a CT module for temporal processing with memory. SGCAT-CT conducts continuous recognition of social relations and memorizes information from as early as the beginning of a long video. Experiments conducted on several video datasets demonstrate encouraging performance on long videos. Our code will be released at https://github.com/HarryWgCN/SGCAT-CT.
科研通智能强力驱动
Strongly Powered by AbleSci AI