Autism Spectrum Disorder (ASD) is a complex neurodevelopmental disorder, and precise prediction using imaging or other biological information is of great significance. However, predicting ASD in individuals presents the following challenges: first, there is extensive heterogeneity among subjects; second, existing models fail to fully utilize rs-fMRI and non-imaging information, resulting in less accurate classification results. Therefore, this paper proposes a novel framework, named HE-MF, which consists of a Hierarchical Feature Extraction Module and a Multimodal Deep Feature Integration Module. The Hierarchical Feature Extraction Module aims to achieve multi-level, fine-grained feature extraction and enhance the model's discriminative ability by progressively extracting the most discriminative functional connectivity features at both the intra-group and overall subject levels. The Multimodal Deep Integration Module extracts common and distinctive features based on rs-fMRI and non-imaging information through two separate channels, and utilizes an attention mechanism for dynamic weight allocation, thereby achieving deep feature fusion and significantly improving the model's predictive performance. Experimental results on the ABIDE public dataset show that the HE-MF model achieves an accuracy of 95.17% in the ASD identification task, significantly outperforming existing state-of-the-art methods, demonstrating its effectiveness and superiority. To verify the model's generalization capability, we successfully applied it to relevant tasks in the ADNI dataset, further demonstrating the HE-MF model's outstanding performance in feature learning and generalization capabilities.