Abstract
Optical coherence tomography (OCT) is a widely used imaging technique in ophthalmology for diagnosis and treatment. Recent advances in deep neural networks (DNNs) and vision transformers (ViTs) have paved the way for automated eye/retinal disease classifications and segmentations using OCT or spectral domain OCT (SD-OCT) images. Diabetic macular edema (DME), choroidal neovascularization (CNV), and Drusen are particularly challenging to accurately classify using OCT images because of their subtle differences and intricate features. Currently, the algorithms reported in the literature using DNNs or ViTs are computationally complex, consider fewer diseases, and are less accurate. This study proposes a hybrid SqueezeNet-vision transformer (SViT) model that combines the strengths of SqueezeNet and vision transformer (ViT), capturing local and global features of OCT images to achieve more accurate classification with less computational complexity. The proposed model uses the OCT2017 dataset for training, testing, and validation, and it performs both binary classification (normal vs disorders) as well as multiclass classification (DME, CNV, Drusen, and normal). As compared to state-of-the-art CNN-based and standalone Transformer models, the proposed SViT model achieves an overall classification accuracy of 99.90% for multiclass classification (CNV: 100%, DME: 99.9%, Drusen: 100%, and normal: 100%). With a good generalization ability, the model can be used to improve patient care and clinical decision-making across a broader range of applications.
Similar content being viewed by others
Data availability
The OCT2017 dataset is available as open-source data and can be accessed from https://www.kaggle.com/paultimothymooney/kermany2018.
References
Sakata LM, DeLeon-Ortega J, Sakata V, Girkin CA (2009) Optical coherence tomography of the retina and optic nerve–a review. Clin Exp Ophthalmol 37(1):90–99
Hui VWK, Szeto SKH, Tang F et al (2022) Optical coherence tomography classification systems for diabetic macular edema and their associations with visual outcome and treatment responses–an updated review. Asia Pac J Ophthalmol (Phila) 11(3):247–257. https://doi.org/10.1097/APO.0000000000000468
Krishna KVSSR, Chaitanya K, Subhashini PPS, Yamparala R, Kanumalli SS (2021) Classification of glaucoma optical coherence tomography (OCT) images based on blood vessel identification using CNN and firefly optimization. Traitement du Signal 38(1):239–245
Tsuji T, Hirose Y, Fujimori K et al (2020) Classification of optical coherence tomography images using a capsule network. BMC Ophthalmol 20:114. https://doi.org/10.1186/s12886-020-01382-4
Stanojevic M, Draškovic D, Nikolic B (2022) Retinal disease classification based on optical coherence tomography images using convolutional neural networks. J Electron Imag 32(3):032004. https://doi.org/10.1117/1.JEI.32.3.032004
Srinivasan PP, Kim LA, Mettu PS, Cousins SW, Comer GM, Izatt JA, Farsiu S (2014) Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images. Biomed Opt Express 5:3568–3577
Chen X, Xue Y, Wu X, Zhong Y, Rao H, Luo H, Weng Z (2023) Deep learning-based system for disease screening and pathologic region detection from optical coherence tomography images. Transl Vis Sci Technol 12(1):29. https://doi.org/10.1167/tvst.12.1.29
Omid NM, Hamid H, Hossein K, Shahriar BS, Ahmad A (2023) MedViT: a robust vision transformer for generalized medical image classification. Comput Biol Med 157:106791. https://doi.org/10.1016/j.compbiomed.2023.106791
Varadarajan AV, Bavishi P, Ruamviboonsuk P et al (2020) Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning. Nat Commun 11:130
Murugappan M, Bourisly AK, Prakash NB et al (2023) Automated semantic lung segmentation in chest CT images using deep neural network. Neural Comput Appl 35:15343–15364. https://doi.org/10.1007/s00521-023-08407-1
Murugappan M, Prakash NB, Jeya R, Mohanarathinam A, Hemalakshmi GR, Mahmud M (2022) A novel few-shot classification framework for diabetic retinopathy detection and grading. Measurement 200:111485. https://doi.org/10.1016/j.measurement.2022.111485
Perdomo O, Rios H, Rodríguez FJ, Otálora S, Meriaudeau F, Müller H, González FA (2019) Classification of diabetes-related retinal diseases using a deep learning approach in optical coherence tomography. Comput Methods Programs Biomed 178:181–189
Ryu G, Lee K, Park D, Park SH, Sagong M (2021) A deep learning model for identifying diabetic retinopathy using optical coherence tomography angiography. Sci Rep 11(1):23024
Das V, Prabhakararao E, Dandapat S, Bora PK (2020) B-Scan attentive CNN for the classification of retinal optical coherence tomography volumes. IEEE Signal Process Lett 27:1025–1029
Wu B, Xu C, Dai X, Wan A, Zhang P, Yan Z, Tomizuka M, Gonzalez J, Keutzer K, Vajda P (2020) Visual transformers: token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677
Ma Z, Xie Q, Xie P, Fan F, Gao X, Zhu J (2022) HCTNet: a hybrid ConvNet-transformer network for retinal optical coherence tomography image classification. Biosensors 12:542. https://doi.org/10.3390/bios12070542
Zhang Y, Li Z, Nan N, Wang X (2023) TranSegNet: hybrid CNN-vision transformers encoder for retina segmentation of optical coherence tomography. Life 13:976. https://doi.org/10.3390/life13040976
Jiang Z, Wang L, Wu Q, Shao Y, Shen M, Jiang W, Dai C (2022) Computer-aided diagnosis of retinopathy based on vision transformer. J Innov Opt Health Sci 15(02):2250009. https://doi.org/10.1142/S1793545822500092
Dutta P, Sathi KA, Hossain MA, Dewan MAA (2023) Conv-ViT: a convolution and vision transformer-based hybrid feature extraction method for retinal disease detection. J Imag 2023(9):140. https://doi.org/10.3390/jimaging9070140
Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272
Dai Y, Gao Y, Liu F (2021) Transmed: transformers advance multi-modal medical image classification. Diagnostics 11(8):1384
Gao Y, Zhou M, Metaxas DN (2021) UTNet: a hybrid transformer architecture for medical image segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2021: 24th international conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III 24, Springer International Publishing, pp 61–71
He K, Gan C, Li Z, Rekik I, Yin Z, Ji W, Gao Y, Wang Q, Zhang J, Shen D (2022) Transformers in medical image analysis: a review. Intell Med 3(1):59–78. https://doi.org/10.1016/j.imed.2022.07.002
Khan A, Rauf Z, Sohail A, Rehman A, Asif H, Asif A, Farooq U (2023) A survey of the vision transformers and its CNN-transformer based variants. arXiv preprint arXiv:2305.09880
Nanni L, Loreggia A, Barcellona L, Ghidoni S (2023) Building ensemble of deep networks: convolutional networks and transformers. IEEE Access 11:124962–124974. https://doi.org/10.1109/ACCESS.2023.3330442
Kermany D, Zhang K, Goldbaum M (2018) Labeled optical coherence tomography (oct) and chest x-ray images for classification. Mendeley Data 2(2):651
Kuwayama S, Ayatsuka Y, Yanagisono D, Uta T, Usui H, Kato A, Takase N, Ogura Y, Yasukawa T (2019) Automated detection of macular diseases by optical coherence tomography and artificial intelligence machine learning of optical coherence tomography images. J Ophthalmol 2019:6319581
Islam MM, Yang HC, Poly TN, Jian WS, Li YCJ (2020) Deep learning algorithms for detection of diabetic retinopathy in retinal fundus photographs: a systematic review and meta-analysis. Comput Methods Programs Biomed 191:105320
Li T, Gao Y, Wang K, Guo S, Liu H, Kang H (2019) Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf Sci 501:511–522
Das V, Dandapat S, Bora PK (2019) Multi-scale deep feature fusion for automated classification of macular pathologies from OCT images. Biomed Signal Process Control 54:101605
Huang L, He X, Fang L, Rabbani H, Chen X (2019) Automatic classification of retinal optical coherence tomography images with layer guided convolutional neural network. IEEE Signal Process Lett 26(7):1026–1030
Cazañas-Gordón A, Parra-Mora E, Cruz LADS (2021) Ensemble learning approach to retinal thickness assessment in optical coherence tomography. IEEE Access 9:67349–67363
Anoop BN, Pavan R, Girish GN, Kothari AR, Rajan J (2020) Stack generalized deep ensemble learning for retinal layer segmentation in optical coherence tomography images. Biocybern Biomed Eng 40(4):1343–1358
Ai Z, Huang X, Feng J, Wang H, Tao Y, Zeng F, Lu Y (2022) FN-OCT: disease detection algorithm for retinal optical coherence tomography based on a fusion network. Front Neuroinform 16:876927. https://doi.org/10.3389/fninf.2022.876927
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53:5455–5516
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprintarXiv:2010.11929
Wassel M, Hamdi AM, Adly N, Torki M (2022) Vision transformers based classification for glaucomatous eye condition. In: 2022 26th international conference on pattern recognition (ICPR), IEEE, pp 5082–5088
Fan R, Alipour K, Bowd C, Christopher M, Brye N, Proudfoot JA, Goldbaum MH, Belghith A, Girkin CA, Fazio MA, Liebmann JM, Weinreb RN, Pazzani M, Kriegman D, Zangwill LM (2023) Detecting glaucoma from fundus photographs using deep learning without convolutions: transformer for improved generalization. Ophthalmol Sci 3(1):100233
Wen H, Zhao J, Xiang S, Lin L, Liu C, Wang T, An L, Liang L, Huang B (2022) Towards more efficient ophthalmic disease classification and lesion location via convolution transformer. Comput Methods Progr Biomed 220:106832
He J, Wang J, Han Z, Ma J, Wang C, Qi M (2023) An interpretable transformer network for the retinal disease classification using optical coherence tomography. Sci Rep 13(1):3637
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Retinal OCT Images (optical coherence tomography) | Kaggle. https://www.kaggle.com/paultimothymooney/kermany2018. Retrieved on 2 June 2023
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hemalakshmi, G.R., Murugappan, M., Sikkandar, M.Y. et al. Automated retinal disease classification using hybrid transformer model (SViT) using optical coherence tomography images. Neural Comput & Applic 36, 9171–9188 (2024). https://doi.org/10.1007/s00521-024-09564-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09564-7