OCTFormer: An Efficient Hierarchical Transformer Network Specialized for Retinal Optical Coherence Tomography Image Recognition

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT(2023)

引用 0|浏览0
暂无评分
摘要
Diabetic retinopathy (DR) is a common complication of diabetes and one of the main causes of blindness in humans, which can be prevented by early-stage detection and treatment. Clinically, ophthalmologists use optical coherence tomography (OCT) image analysis as a basis for diagnosing DR. The existing medical resources can no longer meet the needs of the escalating patient population. Therefore, deep-learning technology has become a mainstream solution for medical image analysis. Vision transformer (ViT), a new neural network structure, has demonstrated great performance in analyzing images. However, due to the lack of inductive bias and prohibition of input image changes in size, ViT cannot avoid over-fitting problems on small datasets and limits the model to biological tissue characteristics. Thus, we propose an OCT multihead self-attention (OMHSA) block that especially calculates OCT image information based on a hybrid CNN-Transformer strategy. Compared to traditional MHSA, OMHSA integrates local information extraction differences into the calculation of self-attention and adds local information to the transformer model without relying on a multibranch network establishment. We built a neural network architecture (OCTFormer) by stacking convolutional layers and OMHSA blocks repeatedly in each stage. Similar to CNN, OCTFormer allows input size change at each stage to achieve a hierarchical structure effect. The model diagnosis effectiveness on the collected retinal OCT dataset was evaluated, and the accuracy reached 98.60%, surpassing the state-of-the-art (SOTA) model. The OCTFormer deployment to mobile terminals through knowledge distillation technology was shown, which presented a reference for deploying transformer models to actual clinical environments.
更多
查看译文
关键词
Computer-aided diagnosis,deep learning,image classification,optical coherence tomography (OCT),vision transformer (ViT)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要