OCTFormer: An Efficient Hierarchical Transformer Network Specialized for Retinal Optical Coherence Tomography Image Recognition

光学相干层析成像计算机科学卷积神经网络人工智能医学影像学深度学习变压器特征提取模式识别（心理学）计算机视觉工程类电压医学电气工程眼科

作者

Haoran Wang,Xinyu Guo,Kaiwen Song,Mingyang Sun,Yanbin Shao,Songfeng Xue,Hongwei Zhang,Tianyu Zhang

出处

期刊：IEEE Transactions on Instrumentation and Measurement [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：72: 1-17 被引量：4

标识

DOI：10.1109/tim.2023.3329106

摘要

Diabetic retinopathy (DR) is a common complication of diabetes and one of the main causes of blindness in humans, which can be prevented by early-stage detection and treatment. Clinically, ophthalmologists use optical coherence tomography (OCT) image analysis as a basis for diagnosing DR. The existing medical resources can no longer meet the needs of the escalating patient population. Therefore, deep learning technology has become a mainstream solution for medical image analysis. Vision Transformer (ViT), a new neural network structure, has demonstrated great performance in analyzing images. However, due to the lack of inductive bias and prohibition of input image changes in size, ViT cannot avoid over-fitting problems on small datasets and limits the model to biological tissue characteristics. Thus, we propose an OCT multi-head self-attention (OMHSA) block that especially calculates OCT image information based on a hybrid CNN-Transformer strategy. Compared to traditional MHSA, OMHSA integrates local information extraction differences into the calculation of self-attention and adds local information to the transformer model without relying on a multi-branch network establishment. We built a neural network architecture (OCT-Former) by stacking convolutional layers and OMHSA blocks repeatedly in each stage. Similar to CNN, OCTFormer allows input size change at each stage to achieve a hierarchical structure effect. The model diagnosis effectiveness on the collected retinal OCT dataset was evaluated, and the accuracy reached 98.60%, surpassing the state-of-the-art (SOTA) model. The OCTFormer deployment to mobile terminals through knowledge distillation technology was shown, which presented a reference for deploying transformer models to actual clinical environments.

求助该文献

最长约 10秒，即可获得该文献文件

OCTFormer: An Efficient Hierarchical Transformer Network Specialized for Retinal Optical Coherence Tomography Image Recognition

今日热心研友