计算机科学
语义学(计算机科学)
班级(哲学)
代表(政治)
关系(数据库)
桥接(联网)
人工智能
语义鸿沟
符号
模式识别(心理学)
图像(数学)
理论计算机科学
数据挖掘
数学
图像检索
程序设计语言
法学
政治
算术
计算机网络
政治学
作者
Jiajun Gao,Yonghong Hou,Zihui Guo,Haochun Zheng
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology
[Institute of Electrical and Electronics Engineers]
日期:2023-05-03
卷期号:33 (11): 6519-6530
被引量:2
标识
DOI:10.1109/tcsvt.2023.3272627
摘要
Zero-shot Action Recognition (ZSAR) aims at bridging the video $\rightarrow $ class relation with only labeled training data of seen classes while generalizing the model to alleviate the heterogeneity of unseen actions. Most existing methods have comprehensively represented videos and action classes, however, the semantic gap and the hubness problem between them remain crucial challenges that are under-explored. In this paper, we propose an effective method to tackle the above issues. Specifically, to narrow the semantic gap, we end-to-end generate a spatio-temporal semantics for each video, which provides essential textual information to refine the video representation. Furthermore, we propose a compactness-separability loss that optimizes the intra- and inter-class relations in a unified formula and quantitatively constrains cluster distribution, thus effectively diminishing the impact of the hubness problem. Extensive experiments on UCF101, HMDB51, and Olympic Sports datasets prove the effectiveness of the proposed approach and demonstrate our approach outperforms the state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI