Abstract Phase unwrapping plays an important role in optical phase measurements. In particular, phase unwrapping under heavy noise conditions remains an open issue. In this paper, a deep learning-based method is proposed to conduct the phase unwrapping task by combining Zernike polynomial fitting and a Swin-Transformer network. In this proposed method, phase unwrapping is regarded as a regression problem, and the Swin-Transformer network is used to map the relationship between the wrapped phase data and the Zernike polynomial coefficients. Because of the self-attention mechanism of the transformer network, the fitting coefficients can be estimated accurately even under extremely harsh noise conditions. Simulation and experimental results are presented to demonstrate the outperformance of the proposed method over the other two polynomial fitting-based methods. This is a promising phase unwrapping method in optical metrology, especially in electronic speckle pattern interferometry.