计算机科学
变压器
目标检测
人工智能
模式识别(心理学)
机器学习
物理
量子力学
电压
作者
Ya-Min Cheng,Hancong Duan,Chen Wang,Zhijun Chen
标识
DOI:10.1016/j.patcog.2023.110021
摘要
Human-object interaction (HOI) detection aims to localize and classify triplets of human, object and interaction from a given image. Earlier two-stage methods suffer both from mutually independent training processes and the interference of redundant negative human-object pairs. Prevailing one-stage transformer-based methods are free from the above problems by tackling HOI in an end-to-end manner. However, one-stage transformer-based methods carry the unnecessary entanglements of the query for different tasks, i.e., human-object detection and interaction classification, and thus bring in poor performance. In this paper, we propose a new transformer-based approach that parallelly disentangles human-object detection and interaction classification in a triplet-wise manner. To make each query focus on one specific task clearly, we exhaustively disentangle HOI by parallelly expanding the naive query in vanilla transformer as triple explicit queries. Then, we introduce a semantic communication layer to preserve the consistent semantic association of each HOI through mixing the feature representations of each query triplet of the correspondence constraint. Extensive experiments demonstrate that our proposed framework outperforms the existing methods and achieves the state-of-the-art performance, with significant reduction in parameters and FLOPs.
科研通智能强力驱动
Strongly Powered by AbleSci AI