计算机科学
人工智能
模式识别(心理学)
嵌入
图形
杠杆(统计)
阈值
分割
余弦相似度
图像(数学)
理论计算机科学
作者
Zhitao Li,Jianzong Wang,Ning Cheng,Jing Xiao
标识
DOI:10.1109/paap54281.2021.9720457
摘要
Video classification is a challenging problem, video segment labels are sparse and expensive to get, and it is important to leverage as much available information as possible from labeled datasets. There have been several ways to capture video frame information but none of them have utilized the information hidden in labels correlation to increase classification accuracy. This work proposed a framework called Graph Convolution Semantic Network of aggregated descriptors (GCSN) which can extract neighboring information of related segment labels to increase video segmentation classification accuracy. Label relation graph was built by thresholding on cosine similarity computed from mutual embedding similarities, word embeddings were generated by Deep Bidirectional Transformers. The testing accuracy on Youtube-8m video segments classification dataset shows that our proposed GCSN outperforms NeXtVLAD baseline by considering additional labels relation information.
科研通智能强力驱动
Strongly Powered by AbleSci AI