TY - JOUR
T1 - Attention-guided 3D-CNN framework for glaucoma detection and structural-functional association using volumetric images
AU - George, Yasmeen
AU - Antony, Bhavna J.
AU - Ishikawa, Hiroshi
AU - Wollstein, Gadi
AU - Schuman, Joel S.
AU - Garnavi, Rahil
N1 - Funding Information:
Manuscript received December 23, 2019; revised April 23, 2020; accepted May 30, 2020. Date of publication June 9, 2020; date of current version December 4, 2020. This work was supported by the National Institutes of Health Eye Institute R01EY013178 and R01EY030929. (Corresponding author: Yasmeen George.) Yasmeen George, Bhavna J. Antony, and Rahil Garnavi are with IBM Research, Melbourne, VIC 3006, Australia (e-mail: [email protected], [email protected]; [email protected]; rahilgar @au1.ibm.com).
Publisher Copyright:
© 2013 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - The direct analysis of 3D Optical Coherence Tomography (OCT) volumes enables deep learning models (DL) to learn spatial structural information and discover new bio-markers that are relevant to glaucoma. Downsampling 3D input volumes is the state-of-art solution to accommodate for the limited number of training volumes as well as the available computing resources. However, this limits the network's ability to learn from small retinal structures in OCT volumes. In this paper, our goal is to improve the performance by providing guidance to DL model during training in order to learn from finer ocular structures in 3D OCT volumes. Therefore, we propose an end-to-end attention guided 3D DL model for glaucoma detection and estimating visual function from retinal structures. The model consists of three pathways with the same network architecture but different inputs. One input is the original 3D-OCT cube and the other two are computed during training guided by the 3D gradient class activation heatmaps. Each pathway outputs the class-label and the whole model is trained concurrently to minimize the sum of losses from three pathways. The final output is obtained by fusing the predictions of the three pathways. Also, to explore the robustness and generalizability of the proposed model, we apply the model on a classification task for glaucoma detection as well as a regression task to estimate visual field index (VFI) (a value between 0 and 100). A 5-fold cross-validation with a total of 3782 and 10,370 OCT scans is used to train and evaluate the classification and regression models, respectively. The glaucoma detection model achieved an area under the curve (AUC) of 93.8% compared with 86.8% for a baseline model without the attention-guided component. The model also outperformed six different feature based machine learning approaches that use scanner computed measurements for training. Further, we also assessed the contribution of different retinal layers that are relevant to glaucoma. The VFI estimation model achieved a Pearson correlation and median absolute error of 0.75 and 3.6%, respectively, for a test set of size 3100 cubes.
AB - The direct analysis of 3D Optical Coherence Tomography (OCT) volumes enables deep learning models (DL) to learn spatial structural information and discover new bio-markers that are relevant to glaucoma. Downsampling 3D input volumes is the state-of-art solution to accommodate for the limited number of training volumes as well as the available computing resources. However, this limits the network's ability to learn from small retinal structures in OCT volumes. In this paper, our goal is to improve the performance by providing guidance to DL model during training in order to learn from finer ocular structures in 3D OCT volumes. Therefore, we propose an end-to-end attention guided 3D DL model for glaucoma detection and estimating visual function from retinal structures. The model consists of three pathways with the same network architecture but different inputs. One input is the original 3D-OCT cube and the other two are computed during training guided by the 3D gradient class activation heatmaps. Each pathway outputs the class-label and the whole model is trained concurrently to minimize the sum of losses from three pathways. The final output is obtained by fusing the predictions of the three pathways. Also, to explore the robustness and generalizability of the proposed model, we apply the model on a classification task for glaucoma detection as well as a regression task to estimate visual field index (VFI) (a value between 0 and 100). A 5-fold cross-validation with a total of 3782 and 10,370 OCT scans is used to train and evaluate the classification and regression models, respectively. The glaucoma detection model achieved an area under the curve (AUC) of 93.8% compared with 86.8% for a baseline model without the attention-guided component. The model also outperformed six different feature based machine learning approaches that use scanner computed measurements for training. Further, we also assessed the contribution of different retinal layers that are relevant to glaucoma. The VFI estimation model achieved a Pearson correlation and median absolute error of 0.75 and 3.6%, respectively, for a test set of size 3100 cubes.
KW - 3D convolutional neural networks
KW - attention guided deep learning
KW - glaucoma detection
KW - gradient-weighted class activation maps
KW - optical coherence tomography
KW - visual field estimation
UR - http://www.scopus.com/inward/record.url?scp=85097570523&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2020.3001019
DO - 10.1109/JBHI.2020.3001019
M3 - Article
C2 - 32750930
AN - SCOPUS:85097570523
SN - 2168-2194
VL - 24
SP - 3421
EP - 3430
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 12
ER -