人工智能
计算机科学
机器学习
杠杆(统计)
动作(物理)
量子力学
物理
作者
Junyu Gao,Mengyuan Chen,Changsheng Xu
标识
DOI:10.1109/tpami.2023.3311447
摘要
With the explosive growth of videos, weakly-supervised temporal action localization (WS-TAL) task has become a promising research direction in pattern analysis and machine learning. WS-TAL aims to detect and localize action instances with only video-level labels during training. Modern approaches have achieved impressive progress via powerful deep neural networks. However, robust and reliable WS-TAL remains challenging and underexplored due to considerable uncertainty caused by weak supervision, noisy evaluation environment, and unknown categories in the open world. To this end, we propose a new paradigm, named vectorized evidential learning (VEL), to explore local-to-global evidence collection for facilitating model performance. Specifically, a series of learnable meta-action units (MAUs) are automatically constructed, which serve as fundamental elements constituting diverse action categories. Since the same meta-action unit can manifest as distinct action components within different action categories, we leverage MAUs and category representations to dynamically and adaptively learn action components and action-component relations. After performing uncertainty estimation at both category-level and unit-level, the local evidence from action components is accumulated and optimized under the Subject Logic theory. Extensive experiments on the regular, noisy, and open-set settings of three popular benchmarks show that VEL consistently obtains more robust and reliable action localization performance than state-of-the-arts.
科研通智能强力驱动
Strongly Powered by AbleSci AI