计算机科学
规范化(社会学)
手语
变压器
人工智能
地标
语音识别
姿势
字错误率
模式识别(心理学)
计算机视觉
工程类
电压
电气工程
语言学
哲学
社会学
人类学
作者
Matyáš Boháček,Marek Hrúz
标识
DOI:10.1109/wacvw54805.2022.00024
摘要
In this paper we present a system for word-level sign language recognition based on the Transformer model. We aim at a solution with low computational cost, since we see great potential in the usage of such recognition system on hand-held devices. We base the recognition on the estimation of the pose of the human body in the form of 2D landmark locations. We introduce a robust pose normalization scheme which takes the signing space in consideration and processes the hand poses in a separate local coordinate system, independent on the body pose. We show experimentally the significant impact of this normalization on the accuracy of our proposed system. We introduce several augmentations of the body pose that further improve the accuracy, including a novel sequential joint rotation augmentation. With all the systems in place, we achieve state of the art top-l results on the WLASL and LSA64 datasets. For WLASL, we are able to successfully recognize 63.18 % of sign recordings in the 100-gloss subset, which is a relative improvement of 5 % from the prior state of the art. For the 300-gloss subset, we achieve recognition rate of 43.78 % which is a relative improvement of 3.8 %. With the LSA64 dataset, we report test recognition accuracy of 100 %.
科研通智能强力驱动
Strongly Powered by AbleSci AI