Improving the accuracy and efficiency of seismic wavefield simulation aids geophysical problem-solving. The finite difference (FD) is widely used, but efficiency drops with increasing grids and higher order of difference formats. This article proposes an attention mechanism-based deep learning method called SeismicTransformer. Compared with theory-driven methods, such as the second-order central difference method, SeismicTransformer offers at least a tenfold improvement in speed. Compared with the networks without the attention mechanism, the SeismicTransformer achieves better results by utilizing global information. The proposed SeismicTransformer offers a promising solution for seismic wavefield simulation.