Abstract In view of the complexity of the engine mechanical structure and the diversity of faults, this paper presents a one-dimensional convolutional neural network (1DCNN)-vision transformer (ViT) ensemble model for identifying engine faults based on acoustic emission (AE) signals. The 1DCNN-ViT ensemble model combines 1DCNN and ViT. Firstly, AE signals of various faults are collected on the engine fault test rig. The dataset is constructed from its High-Mel Filterbank feature, which applies to AE signals. The proposed model has advantageous performance on this dataset. Secondly, the proposed model has a higher test accuracy than other new models. Finally, the fault data with different signal-to-noise ratios are input into the trained models, and the proposed model has better anti-noise performance. Overall, the proposed method can more accurately identify the AE signals of engine faults. It can be used as an effective method to diagnose engine faults.