In the era of Industry 5.0, there has been tremendous usage of android platforms in several handheld and mobile devices. The openness of the android platform makes it vulnerable for critical malware attacks. Meanwhile, there is also dramatic advancement in malware obfuscation and evading strategies. This leads to failure of traditional malware detection methods. Recently, machine learning techniques have shown promising outcome for malware detection. But past works utilizing machine learning algorithms suffer from several challenges such as inadequate feature extraction, dependency on hand-crafted features, and many more. Thus, existing machine learning approaches are inefficient in detecting sophisticated malware, thus require further enhancement. In this paper, we extract behavioural characteristics of system calls and dynamic API features using our proposed multimodal deep learning model (MDLDroid). Our model extracts system call features using LSTM layers and extracts dynamic API features using CNN. Further, both the features are fused in a vector space which is finally classified for benign and malign categories. Comparison with several state-of-the-art approaches on two dataset shows a significant improvement of 4–12% by the metric accuracy.