手语
计算机科学
变压器
过度拟合
语音识别
特征提取
人工智能
机器人
手势
手势识别
语言模型
计算机视觉
自然语言处理
人工神经网络
工程类
语言学
哲学
电气工程
电压
作者
Feng Xiao,Ruyu Liu,Tiantian Yuan,Zhimin Fan,Jiajia Wang,Jianhua Zhang
标识
DOI:10.1109/aciiw57231.2022.10086026
摘要
Human-Robot interaction (HRI) usually focuses on the interaction between normal people and robots, ignoring the needs of deaf-mute people. Deaf-mute individuals utilize sign language to communicate their thoughts and emotions. Therefore, continuous sign language recognition (CSLR) can be introduced to the robot for communicating with deaf-mute people. However, the mainstream CSLR, which consists of two main modules, i.e., visual feature extraction and contextual modeling, has several problems. Visual features are usually extracted frame-by-frame and lack global contextual information, which results in a crucial impact on subsequent context modeling. In addition, we discovered a substantial degree of redundancy in the sign language data, which can significantly slow down model training and exacerbate the problem of model overfitting. To solve these problems, in this paper, we propose a novel vision transformer-based sign language recognition network combined with the off-frame extraction (KFE) module for accurate end-to-end recognition of input video sequences. Two CSLR benchmarks, TJUT-SLRT and USTC-CSL, have been the subject of our experiments. The outcomes of our experiments illustrate the efficacy of our method.
科研通智能强力驱动
Strongly Powered by AbleSci AI