Long time-series forecasting plays a crucial role in production and daily life, covering various areas such as electric power loads, stock trends and road traffic. Attention-based models have achieved significant performance advantages based on the long-term modelling capabilities of self-attention. However, regarding the criticized quadratic time complexity of the self-attention mechanism, most subsequent work has attempted to improve on it from the perspective of the sparse distribution of attention. In the main line of these works, we further investigate the position distribution of Top-u attention in the long-tail distribution of sparse attention and propose a two-stage self-attention mechanism, named $$\textsf{ProphetAttention}$$ . Specifically, in the training phase, $$\textsf{ProphetAttention}$$ memorizes the position of Top-u attention, and in the prediction phase, it uses the recorded position indices of Top-u attention to directly obtain Top-u attention for sparse attention computation, thereby avoiding the redundant computation of measuring Top-u attention. Results on four widely used real-world datasets demonstrate that $$\textsf{ProphetAttention}$$ improves the prediction efficiency of long sequence time-series compared to the $$\textsf{Informer}$$ model by approximately 17%–26% across all prediction horizons and significantly promotes prediction speed.