计算机科学
软件部署
参数化复杂度
卷积神经网络
语音识别
多样性(控制论)
深度学习
人工智能
循环神经网络
资源(消歧)
机器学习
人工神经网络
算法
计算机网络
操作系统
作者
Iason Ioannis Panagos,Giorgos Sfikas,Christophoros Nikou
标识
DOI:10.1145/3549737.3549785
摘要
Audio visual speech recognition has seen remarkable progress in the last few years. This progress is a result, on the one hand, of advances in deep learning-based architectures, such as convolutional and recurrent neural networks, and, on the other hand, due to large-scale public datasets have been introduced that provide a great variety of speakers. Both factors have led authors to develop deep architectures that achieve impressive results that surpass humans in the areas of speech recognition, especially in cases where only the video is present. Nevertheless, these architectures involve millions of parameters that increase their storage and memory demands and also limit their deployment in resource constrained scenarios. An additional issue is the energy expenditure due to the amount of calculations required for training, fine-tuning and testing. In this work, we attempt to mitigate some of these shortcomings in speech recognition models by incorporating parameterized hypercomplex layers that reduce the number of required resources. We present models that are competitive with the state-of-the-art while operating with fewer parameters.
科研通智能强力驱动
Strongly Powered by AbleSci AI