计算机科学
水准点(测量)
领域(数学分析)
比例(比率)
情报检索
代表(政治)
答疑
开放域
人工智能
数学
地理
政治
法学
数学分析
大地测量学
地图学
政治学
作者
Yu Zhou,Dejing Xu,Jun Yu,Ting Yu,Zhou Zhao,Yueting Zhuang,Dacheng Tao
出处
期刊:Proceedings of the ... AAAI Conference on Artificial Intelligence
[Association for the Advancement of Artificial Intelligence (AAAI)]
日期:2019-07-17
卷期号:33 (01): 9127-9134
被引量:131
标识
DOI:10.1609/aaai.v33i01.33019127
摘要
Recent developments in modeling language and vision have been successfully applied to image question answering. It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA). Compared to the image domain where large scale and fully annotated benchmark datasets exists, VideoQA datasets are limited to small scale and are automatically generated, etc. These limitations restrict their applicability in practice. Here we introduce ActivityNet-QA, a fully annotated and large scale VideoQA dataset. The dataset consists of 58,000 QA pairs on 5,800 complex web videos derived from the popular ActivityNet dataset. We present a statistical analysis of our ActivityNet-QA dataset and conduct extensive experiments on it by comparing existing VideoQA baselines. Moreover, we explore various video representation strategies to improve VideoQA performance, especially for long videos.
科研通智能强力驱动
Strongly Powered by AbleSci AI