计算机科学
搜索引擎索引
聚类分析
元数据
光学字符识别
人工智能
索引(排版)
语音识别
情报检索
自然语言处理
万维网
图像(数学)
作者
Sandeep Varma,Arunanshu Pandey,Shivam,Sagnik Das,Subhasis Roy
标识
DOI:10.1007/978-3-030-96600-3_14
摘要
AbstractWith the ever-increasing internet penetration across the world, there has been a huge surge in the content on the worldwide web. Video has proven to be one of the most popular media. The COVID-19 pandemic has further pushed the envelope, forcing learners to turn to E-Learning platforms. In the absence of relevant descriptions of these videos, it becomes imperative to generate metadata based on the content of the video. In the current paper, an attempt has been made to index videos based on the visual and audio content of the video. The visual content is extracted using an Optical Character Recognition (OCR) on the stack of frames obtained from a video while the audio content is generated using an Automatic Speech Recognition (ASR). The OCR and ASR generated texts are combined to obtain the final description of the respective video. The dataset contains 400 videos spread across 4 genres. To quantify the accuracy of our descriptions, clustering is performed using the video description to discern between the genres of video.KeywordsOptical Character RecognitionAutomatic Speech RecognitionVideo analyticsNatural language processingK-means clustering
科研通智能强力驱动
Strongly Powered by AbleSci AI