Geohazard prediction is one of the most important and challenging tasks in underground mining. It still remains difficult to improve the prediction accuracy and make it compatible with the ever-increasing data in mining, especially when the data are sparsely allocated in a large-scale mining environment. This study introduces an innovative multimodal data fusion approach for geohazard prediction in underground mining to address this challenge. By incorporating visual model data as a novel modality and using interpolated rock mass rating data as a cross-complementary factor, the framework enhances the effectiveness of data fusion. Specific machine learning models were used and validated (e.g., neural networks, SVM, KNN, etc.) for proposed multimodal data fusion, addressing challenges posed by sparsely scattered multidimensional data, which generally have weak spatial connections across diverse datasets. In detail, to enhance spatial connection among diverse datasets, this paper leverage digitalised and gridded CAD file-based visual model data as a foundational carrier, the new modality, to facilitate the establishment of robust internal connections with routine data. Additionally, rock mass rating data is interpolated and aligned with visual model data to enhance spatial connections, improving spatial information-orientated data fusion. Then, to validate the accuracy and efficiency of the novel multimodal data fusion framework, we process and integrate two different routine data from a case study mine. Performance is tested by nine different data combinations, originating from two routine datasets, visual model data, and rock mass rating data. Finally, through comprehensive cross-validation, the proposed multimodal data fusion framework significantly improves the stability of prediction models at a comprehensive mine site scale, with high accuracy and low False-Negative rate.