摘要
The approaches based on vision transformers (ViTs) are advancing the field of medical artificial intelligence (AI) and cancer diagnosis. Recently, many researchers have developed artificial intelligence methods for cancer diagnosis based on ViTs. In this paper, 98 pertinent articles since 2020 were carefully chosen from digital databases, including Google scholar, Elsevier, and Springer Link, to review the research progress of artificial intelligence methods for cancer imaging based on ViT. Method: The basic structure of ViT is introduced, and corresponding modules such as patch embedding, positional embedding, transformer encoder, multi-head self-attention (MSA), layer normalization (LN), and residual connections, multilayer perceptron (MLP) are elaborated; a comprehensive review of improved ViT models in the medical field is presented. The application of ViT technology in cancer analysis based on medical images was reviewed. Results: ViT has achieved great success in cancer diagnosis based on medical images, showing its advantages in image classification, image reconstruction, image detection, image segmentation, image registration, image fusion, and other tasks. In these task studies, the most common task is cancer image classification and segmentation. There is still a lot of room for improvement in the aspects of multi-task learning, multi-modal learning, model generality, generalization ability, and explainability, and it also faces the mutual restriction of model scale and performance. Conclusion: The ViT training model for cancer diagnosis can potentially improve. The ViT model of self-supervised learning and semi-supervised learning mechanism is promising research. The lightweight attention module design, ViTs based on mobile networks, and the development of 3DViT will promote cancer diagnosis based on medical images to be more accurate and efficient.