隐藏字幕
计算机科学
过程(计算)
光学(聚焦)
人机交互
人工智能
图像(数学)
操作系统
光学
物理
作者
Hongkuan Zhang,Koichi Takeda,Ryohei Sasano,Yusuke Adachi,Kiyonobu Ohtani
标识
DOI:10.1109/ivworkshops54471.2021.9669259
摘要
Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.
科研通智能强力驱动
Strongly Powered by AbleSci AI