LCSNet: End-to-end Lipreading with Channel-aware Feature Selection

计算机科学 语音识别 发音 人工智能 特征(语言学) 任务(项目管理) 连接主义 模式识别(心理学) 人工神经网络 频道(广播) 过程(计算) 解码方法 计算机网络 语言学 电信 操作系统 哲学 经济 管理
作者
Feng Xue,Tian Yang,Kang Liu,Zikun Hong,Mingwei Cao,Dan Guo,Richang Hong
出处
期刊:ACM Transactions on Multimedia Computing, Communications, and Applications [Association for Computing Machinery]
卷期号:19 (1s): 1-21 被引量:5
标识
DOI:10.1145/3524620
摘要

Lipreading is a task of decoding the movement of the speaker’s lip region into text. In recent years, lipreading methods based on deep neural network have attracted widespread attention, and the accuracy has far surpassed that of experienced human lipreaders. The visual differences in some phonemes are extremely subtle and pose a great challenge to lipreading. Most of the lipreading existing methods do not process the extracted visual features, which mainly suffer from two problems. First, the extracted features contain lot of useless information such as noise caused by differences in speech speed and lip shape, for example. In addition, the extracted features are not abstract enough to distinguish phonemes with similar pronunciation. These problems have a bad effect on the performance of lipreading. To extract features from the lip regions that are more distinguishable and more relevant to the speech content, this article proposes an end-to-end deep neural network-based lipreading model (LCSNet). The proposed model extracts the short-term spatio-temporal features and the motion trajectory features from the lip region in the video clips. The extracted features are filtered by the channel attention module to eliminate the useless features and then used as input to the proposed Selective Feature Fusion Module (SFFM) to extract the high-level abstract features. Afterwards, these features are used as input to the bidirectional GRU network in time order for temporal modeling to obtain the long-term spatio-temporal features. Finally, a Connectionist Temporal Classification (CTC) decoder is used to generate the output text. The experimental results show that the proposed model achieves a 1.0% CER and 2.3% WER on the GRID corpus database, which, respectively, represents an improvement of 52% and 47% compared to LipNet.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
0x1orz完成签到,获得积分10
1秒前
caimeng应助rqy采纳,获得10
3秒前
4秒前
华仔应助二掌柜采纳,获得30
4秒前
4秒前
我不爱池鱼应助天玄一刀采纳,获得10
5秒前
王龙天完成签到,获得积分10
6秒前
qqqqqqqqqqqq完成签到,获得积分10
7秒前
王龙天发布了新的文献求助10
9秒前
yuaaaann发布了新的文献求助10
11秒前
椿iii完成签到 ,获得积分10
12秒前
14秒前
我不爱池鱼应助Hayat采纳,获得10
15秒前
19秒前
可爱的函函应助kaio采纳,获得10
19秒前
Mr.Ren发布了新的文献求助10
23秒前
小谢发布了新的文献求助10
24秒前
动听的谷秋完成签到 ,获得积分10
24秒前
25秒前
27秒前
28秒前
wanci应助whuhustwit采纳,获得10
29秒前
斯文败类应助科研通管家采纳,获得30
32秒前
二掌柜发布了新的文献求助30
32秒前
传奇3应助科研通管家采纳,获得10
32秒前
清爽老九应助科研通管家采纳,获得10
32秒前
嗯哼应助科研通管家采纳,获得10
32秒前
酷波er应助科研通管家采纳,获得10
32秒前
Yang完成签到,获得积分10
38秒前
44秒前
44秒前
whuhustwit发布了新的文献求助10
48秒前
汉堡包应助零花钱采纳,获得10
49秒前
55秒前
NexusExplorer应助Smiles采纳,获得10
55秒前
56秒前
咖啡加盐完成签到,获得积分10
57秒前
as发布了新的文献求助10
59秒前
我不爱池鱼应助Hayat采纳,获得10
59秒前
zheli发布了新的文献求助10
1分钟前
高分求助中
LNG地下式貯槽指針(JGA指-107) 1000
LNG地上式貯槽指針 (JGA指 ; 108) 1000
Preparation and Characterization of Five Amino-Modified Hyper-Crosslinked Polymers and Performance Evaluation for Aged Transformer Oil Reclamation 700
Operative Techniques in Pediatric Orthopaedic Surgery 510
How Stories Change Us A Developmental Science of Stories from Fiction and Real Life 500
九经直音韵母研究 500
Full waveform acoustic data processing 500
热门求助领域 (近24小时)
化学 医学 材料科学 生物 工程类 有机化学 生物化学 物理 内科学 纳米技术 计算机科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 免疫学 细胞生物学 电极
热门帖子
关注 科研通微信公众号,转发送积分 2929613
求助须知:如何正确求助?哪些是违规求助? 2580750
关于积分的说明 6960728
捐赠科研通 2229833
什么是DOI,文献DOI怎么找? 1184735
版权声明 589533
科研通“疑难数据库(出版商)”最低求助积分说明 579889