隐藏字幕
计算机科学
时间戳
事件(粒子物理)
语音识别
变压器
人工智能
多媒体
图像(数学)
实时计算
工程类
物理
量子力学
电压
电气工程
作者
Lakshmi Harika Palivela,S. Swetha,M. Nithish Guhan,M. Prasanna Venkatesh
标识
DOI:10.1007/978-981-19-7753-4_40
摘要
Videos are composed of multiple tasks. Dense video captioning entails captioning of different events in the video. A textual description is generated based on visual, speech and audio cues from a video and then topic modeling is performed on the generated caption. Uncertainty modeling technique is applied for finding temporal event proposals where timestamps for each event in the video are produced and also uses Transformer which inputs multi-modal features to identify captions effectively and to make it more precise. Topic modeling tasks include highlighted keywords in the captions generated and topic generation i.e., category under which the whole caption belongs to. The proposed model generates a textual description based on the dynamic and static visual features and audio cues from a video and then topic modeling is performed on the generated caption.
科研通智能强力驱动
Strongly Powered by AbleSci AI