A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges

深度学习人工智能计算机科学机器学习领域（数学）模式多模式学习社会科学数学社会学纯数学

作者

Khaled Bayoudh

出处

期刊：Information Fusion [Elsevier BV]
日期：2023-12-30 卷期号：105: 102217-102217 被引量：85

链接

ube.hal.sciencedoi.org

标识

DOI：10.1016/j.inffus.2023.102217

摘要

In recent years, deep learning algorithms have rapidly revolutionized artificial intelligence, particularly machine learning, enabling researchers and practitioners to extend previously hand-crafted feature extraction procedures. In particular, deep learning uses adaptive learning processes to learn more complex and informative patterns from datasets of varying sizes. With the increasing availability of multimodal data streams and recent advances in deep learning algorithms, multimodal deep learning is on the rise. This requires the development of complex models that can process and analyze multimodal information in a consistent manner. However, unstructured data can come in many different forms (also known as modalities). Extracting relevant features from this data remains an ambitious goal for deep learning researchers. According to the literature, most deep learning systems consist of a single architecture (i.e., standalone deep learning). When two or more deep learning architectures are combined over multiple sensory modalities, the result is called a multimodal hybrid deep learning model. Since this research direction has received much attention in the field of deep learning, the purpose of this survey is to provide a broader overview of the topic. In this paper, we provide a comprehensive review of recent advances in multimodal hybrid deep learning, including a thorough analysis of the most commonly developed hybrid architectures. In particular, one of the main challenges in multimodal hybrid analysis is the ability of these architectures to systematically integrate cross-modal features in hybrid designs. Therefore, we propose a generic framework for multimodal hybrid learning that focuses mainly on fusion methods. We also identify trends and challenges in multimodal hybrid learning and provide insights and directions for future research. Our findings show that multimodal hybrid learning can perform well in a variety of challenging computer vision applications and tasks.

求助该文献

最长约 10秒，即可获得该文献文件

A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges

今日热心研友