计算机科学
判决
语音识别
变压器
特征(语言学)
词(群论)
阅读(过程)
自然语言处理
人工智能
数学
语言学
几何学
量子力学
物理
哲学
电压
作者
Hongyu Zhen,Chenglong Jiang,Jiyong Zhou,Liming Liang,Ying Gao
标识
DOI:10.1007/978-981-99-8537-1_22
摘要
Lip reading is to recognize the spoken content from silent video of lip movement. There is a general problem in sentence-level lip reading that the length of predicted text is inconsistent with actual text. To alleviate this problem, we introduce video word boundary information into sentence-level lip reading and propose TLiM-VWB model. Besides, to deal with the situation that video word boundaries can not be obtained in wild environment, we propose LiM-VWB-KD method with two knowledge distillation strategies utilizing video word boundary information implicitly. We evaluate our model and method on CMLR and LRS2 datasets with metrics of CER/WER and our proposed length difference rate (LDR). We verify the effectiveness of video word boundary information to improve sentence-level lip reading accuracy through the results of TLiM-VWB model. We also show the effectiveness of LiM-VWB-KD method especially with feature-based strategy. Our LiM-VWB-KD method achieves the best result on Chinese sentence-level lip reading among methods using Transformer architecture and achieves the new state-of-the-art performance in speaker-independent setting on CMLR.
科研通智能强力驱动
Strongly Powered by AbleSci AI