Crohn's disease (CD) is a chronic inflammatory disease with increasing incidence worldwide and unclear etiology. Its clinical manifestations vary depending on location, extent, and severity of the lesions. In order to diagnose Crohn's disease, medical professionals need to comprehensively analyze patients' multimodal examination data, which includes medical imaging such as colonoscopy, pathological, and text information from clinical records. The processes of multimodal data analysis require collaboration among medical professionals from different departments, which wastes a lot of time and human resources. Therefore, a multimodal medical assisted diagnosis system for Crohn's disease is particularly significant. Existing network frameworks find it hard to effectively capture multimodal patient data for diagnosis, and multimodal data for Crohn's disease is currently lacking. In addition,a combination of data from patients with similar symptoms could serve as an effective reference for disease diagnosis. Thus, we propose a multimodal information diagnosis network (MICDnet) to learn CD feature representations by integrating colonoscopy, pathology images and clinical texts. Specifically, MICDnet first preprocesses each modality data, then uses encoders to extract image and text features separately. After that, multimodal feature fusion is performed. Finally, CD classification and diagnosis are conducted based on the fused features. Under the authorization, we build a dataset of 136 hospitalized inspectors, with colonoscopy images of seven areas, pathology images, and clinical record text for each individual. Training MICDnet on this dataset shows that multimodal diagnosis can improve the diagnostic accuracy of CD, and the diagnostic performance of MICDnet is superior to other models.