一般化
人工智能
公制(单位)
动作识别
计算机科学
动作(物理)
对象(语法)
理性
联想(心理学)
视觉对象识别的认知神经科学
机器学习
模式识别(心理学)
心理学
数学
经济
数学分析
物理
法学
量子力学
心理治疗师
班级(哲学)
运营管理
政治学
作者
Rui Yan,Peng Huang,Xiangbo Shu,Junhao Zhang,Yonghua Pan,Jinhui Tang
标识
DOI:10.1145/3503161.3547862
摘要
Compositional action recognition which aims to identify the unseen combinations of actions and objects has recently attracted wide attention. Conventional methods bring in additional cues (e.g., dynamic motions of objects) to alleviate the inductive bias between the visual appearance of objects and the human action-level labels. Besides, compared with non-compositional settings, previous methods only pursue higher performance in compositional settings, which can not prove their generalization ability. To this end, we firstly rethink the problem and design a more generalized metric (namely Drop Ratio) and a more practical setting to evaluate the compositional generalization of existing action recognition algorithms. Beyond that, we propose a simple yet effective framework, Look Less Think More (LLTM), to reduce the strong association between visual objects and action-level labels (Look Less), and then discover the commonsense relationships between object categories and human actions (Think More). We test the rationality of the proposed Drop Ratio and Practical setting by comparing several popular action recognition methods on SSV2. Besides, the proposed LLTM achieves state-of-the-art performance on SSV2 with different settings.
科研通智能强力驱动
Strongly Powered by AbleSci AI