Abstract Multimodal learning provides a path to fully utilize all types of information related to the modeling target to provide the model with a global vision. Zero‐shot learning (ZSL) is a general solution for incorporating prior knowledge into data‐driven models and achieving accurate class identification. The combination of the two, known as multimodal ZSL (MZSL), can fully exploit the advantages of both technologies and is expected to produce models with greater generalization ability. However, the MZSL algorithms and applications have not yet been thoroughly investigated and summarized. This study fills this gap by providing an objective overview of MZSL's definition, typical algorithms, representative applications, and critical issues. This article will not only provide researchers in this field with a comprehensive perspective, but it will also highlight several promising research directions. This article is categorized under: Algorithmic Development > Multimedia Technologies > Classification Technologies > Machine Learning