计算机科学
人工智能
模式识别(心理学)
编码器
最近邻搜索
k-最近邻算法
帧(网络)
特征向量
特征(语言学)
余弦相似度
电信
语言学
哲学
操作系统
作者
Guozhang Li,De Cheng,Nannan Wang,Jie Li,Xinbo Gao
出处
期刊:IEEE transactions on image processing
[Institute of Electrical and Electronics Engineers]
日期:2024-01-01
卷期号:33: 2419-2430
被引量:3
标识
DOI:10.1109/tip.2024.3378477
摘要
Due to the sparse single-frame annotations, current Single-Frame Temporal Action Localization (SF-TAL) methods generally employ threshold-based pseudo-label generation strategies. However, these approaches suffer from inefficient data utilization, as only parts of unlabeled frames with confidence scores surpassing a predefined threshold are selected for training. Moreover, the variability of single-frame annotations and unreliable model predictions introduce pseudo-label noise. To address these challenges, we propose two strategies by using the relationship of the video segments with their neighbors': 1) temporal neighbor-guided soft pseudo-label generation (TNPG); and 2) semantic neighbor-guided pseudo-label refinement (SNPR). TNPG utilizes a local-global self-attention mechanism in a transformer encoder to capture temporal neighbor information while focusing on the whole video. Then the generated self-attention map is multiplied by the network predictions to propagate information between labeled and unlabeled frames, and produce soft pseudo-label for all segments. Despite this, label noise persists due to unreliable model predictions. To mitigate this, SNPR refines pseudo-labels based on the assumption that predictions should resemble their semantic nearest neighbors'. Specifically, we search for semantic nearest neighbors of each video segment by cosine similarity in the feature space. Then the refined soft pseudo-labels can be obtained by a weight combination of the original pseudo-label and the semantic nearest neighbors'. Finally, the model can be trained with the refined pseudo-labels, and the performance has been greatly improved. Comprehensive experimental results on different benchmarks show that we achieve state-of-the-art performances on THUMOS14, ActivityNet1.2, and ActivityNet1.3 datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI