Abstract Accurate prediction of tool wear is essential to ensure the machining quality of parts. However, in the actual milling process, the data distribution varies greatly between sensor signals due to variations in individual tools and machining parameters; moreover, a single deep learning model is less reliable when processing a large volume of signals. All these problems make accurate tool wear prediction challenging. Therefore, we propose a multi-model method with two-stage. In the first stage, the tool wear data is initially divided into two parts. For each part, we design a correlation-aligned multiscale convolutional temporal attention gated recurrent neural network model to perform preliminary prediction, aiming at extracting the deep temporal features from diverse signals and mitigating the sensitivity of the features to the changes in data distributions. In the second stage, we adaptively aggregate the preliminary prediction from multiple models to obtain the final prediction via a joint decision-making module to extend the decision boundary of single model and improve the tool wear prediction performance. Finally, two sets of experiments are conducted for different tools and machining conditions. The experimental results show that our proposed method significantly reduces the root mean square error (RMSE) by 15% and the mean absolute error by 18% compared to other methods.