计算机科学
手语
美国手语
集合(抽象数据类型)
水准点(测量)
人工智能
词汇
一般化
边距(机器学习)
领域(数学)
比例(比率)
数据集
资源(消歧)
符号(数学)
自然语言处理
机器学习
语言学
物理
数学
大地测量学
量子力学
纯数学
数学分析
计算机网络
哲学
程序设计语言
地理
作者
Hamid Reza Vaezi Joze,Oscar Koller
出处
期刊:Cornell University - arXiv
日期:2018-01-01
被引量:55
标识
DOI:10.48550/arxiv.1812.01053
摘要
Sign language recognition is a challenging and often underestimated problem comprising multi-modal articulators (handshape, orientation, movement, upper body and face) that integrate asynchronously on multiple streams. Learning powerful statistical models in such a scenario requires much data, particularly to apply recent advances of the field. However, labeled data is a scarce resource for sign language due to the enormous cost of transcribing these unwritten languages. We propose the first real-life large-scale sign language data set comprising over 25,000 annotated videos, which we thoroughly evaluate with state-of-the-art methods from sign and related action recognition. Unlike the current state-of-the-art, the data set allows to investigate the generalization to unseen individuals (signer-independent test) in a realistic setting with over 200 signers. Previous work mostly deals with limited vocabulary tasks, while here, we cover a large class count of 1000 signs in challenging and unconstrained real-life recording conditions. We further propose I3D, known from video classifications, as a powerful and suitable architecture for sign language recognition, outperforming the current state-of-the-art by a large margin. The data set is publicly available to the community.
科研通智能强力驱动
Strongly Powered by AbleSci AI