计算机科学
接头(建筑物)
人工智能
图形
计算机视觉
力矩(物理)
理论计算机科学
经典力学
物理
工程类
建筑工程
作者
Ruomei Wang,Jiawei Feng,Fuwei Zhang,Xiaonan Luo,Yuanmao Luo
标识
DOI:10.1109/tcsvt.2024.3389024
摘要
The joint task of video moment retrieval and video highlight detection is a challenging study, which requires building a model that not only captures contextual information between sequences in time but also has the ability to understand and judge significance. This paper solves these problems from three aspects. Firstly, we design a parameter-free cross-modal statistical correlation interaction method. A novel saliency enhancement function is defined to quantify the saliency differences between the important features associated with the query and other features to achieve parameter-free cross-modal fusion. Secondly, we propose a novel modality-aware heterogeneous graph reasoning mechanism (MHGR). MHGR can effectively capture the global context information between sequences, enhance the local association relationship between sequences, and deal with the complexity of multi-modal data better through the organic combination of two key modules: parameter-free cross-modal statistical correlation interaction, and heterogeneous graph reasoning mechanism. Thirdly, a lightweight solution for the joint task of video moment retrieval and highlight detection is designed based on the above two novel algorithm modules. Comprehensive experiments are conducted on publicly available benchmark data to validate the advantages of the new solution in comparison with a series of state-of-the-art peer methods. Quantitative results consistently demonstrate that the new solution is lightweight and has high inference performance so the remarkable improvement in accuracy achieved by the new solution with respect to peer methods. An extended ablation study is further conducted to show the usefulness of each module of the solution in acquiring its computational capabilities.
科研通智能强力驱动
Strongly Powered by AbleSci AI