Man Luo,Tejas Gokhale,Neeraj Varshney,Yezhou Yang,Chitta Baral
出处
期刊:Synthesis lectures on computer vision [Morgan & Claypool] 日期:2024-06-25卷期号:: 35-91
标识
DOI:10.1007/978-3-031-57816-8_3
摘要
In today's rapidly evolving digital landscape, the wealth of available information has expanded beyond the boundaries of traditional text-based content. With the proliferation of multimedia platforms and data sources, we are constantly bombarded with a rich variety of images, videos, audio, and text. This vast array of heterogeneous data poses new challenges and opportunities for the field of Information Retrieval (IR). To address these challenges and harness the potential of multimodal information, researchers and practitioners have turned their attention toward the development of Multimodal Information Retrieval (MMIR) systems. We will begin by introducing the basic concept of IR systems which will lay the foundation for understanding the mechanism of IR. In this section, we will cover the concepts of query and target, indexing, and scoring functions. Then, we describe the state-of-the-art retrieval models for unimodal and multimodal IR systems. The unimodal retrieval is the foundation of multimodal IR including the text and the image IR. In the section on Multimodal IR, we will differentiate it with the cross-modal IR, and focus on multimodal-query IR. We will discuss two representative multimodal-query IR in detail. After this, we will discuss applications application of multimodal IR in crucial downstream tasks. Later, we will discuss the evaluation metrics spanning from traditional evaluation to advanced semantic-based measurement. Finally, we will discuss the broader impact of MMIR.