Abstract Objective. Metal artifacts severely damaged human tissue information from the computed tomography (CT) image, posing significant challenges to disease diagnosis. Deep learning (DL) has been widely explored for the metal artifact reduction (MAR) task. Nevertheless, paired metal artifact CT datasets suitable for training do not exist in reality. Although the synthetic CT image dataset provides additional training data, the trained networks still generalize poorly to real metal artifact data.
Approach. A self-supervised U-shaped Transformer network (SUTransNet) is proposed to focus on model generalizability enhancement in MAR tasks. This framework consists of a self-supervised mask reconstruction pre-text task and a down-stream task. In the pre-text task, the CT images are randomly corrupted by masks. They are recovered with themselves as the label, aiming at acquiring the artifacts and tissue structure of the actual physical situation. Down-stream task fine-tunes MAR target through labeled images. Utilizing the multi-layer long-range feature extraction capabilities of the Transformer efficiently captures features of metal artifacts. The incorporation of the MAR bottleneck allows for the distinction of metal artifact features through cross-channel self-attention.
Main result. Experiments demonstrate that the framework maintains strong generalization ability in the MAR task, effectively preserving tissue details while suppressing metal artifacts. The results achieved a peak signal-to-noise ratio (PSNR) of 43.86 dB and a structural similarity index (SSIM) of 0.9863 while ensuring the efficiency of the model inference. In addition, the Dice coefficient and Mean Intersection over Union (MIoU) are improved by 11.70% and 9.51% in the segmentation of the MAR image, respectively.
Significance. The combination of unlabeled real-artifact CT images and labeled synthetic-artifact CT images facilitates a self-supervised learning process that positively contributes to model generalizability.