深度学习
计算机科学
模式
多模式学习
人工智能
手势
多模态
人机交互
万维网
社会科学
社会学
作者
Jabeen Summaira,Xi Li,Muhammad Shoib Amin,Omar El Farouk Bourahla,Songyuan Li,Abdul Jabbar
摘要
Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This article focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, physiological signals, flow, RGB, pose, depth, mesh, and point cloud. Detailed analysis of the baseline approaches and an in-depth study of recent advancements during the past five years (2017 to 2021) in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning methods is proposed, elaborating on different applications in more depth. Last, main issues are highlighted separately for each domain, along with their possible future research directions.
科研通智能强力驱动
Strongly Powered by AbleSci AI