TY - JOUR
T1 - The diagnosis of dengue in patients presenting with acute febrile illness using supervised machine learning and impact of seasonality
AU - Ming, Damien K.
AU - Tuan, Nguyen M.
AU - Hernandez, Bernard
AU - Sangkaew, Sorawat
AU - Vuong, Nguyen L.
AU - Chanh, Ho Q.
AU - Chau, Nguyen V.V.
AU - Simmons, Cameron P.
AU - Wills, Bridget
AU - Georgiou, Pantelis
AU - Holmes, Alison H.
AU - Yacoub, Sophie
AU - on behalf of the Vietnam ICU Translational Applications Laboratory (VITAL) Investigators
N1 - Publisher Copyright:
Copyright © 2022 Ming, Tuan, Hernandez, Sangkaew, Vuong, Chanh, Chau, Simmons, Wills, Georgiou, Holmes and Yacoub.
PY - 2022/3/14
Y1 - 2022/3/14
N2 - Background: Symptomatic dengue infection can result in a life-threatening shock syndrome and timely diagnosis is essential. Point-of-care tests for non-structural protein 1 and IgM are used widely but performance can be limited. We developed a supervised machine learning model to predict whether patients with acute febrile illnesses had a diagnosis of dengue or other febrile illnesses (OFI). The impact of seasonality on model performance over time was examined. Methods: We analysed data from a prospective observational clinical study in Vietnam. Enrolled patients presented with an acute febrile illness of <72 h duration. A gradient boosting model (XGBoost) was used to predict final diagnosis using age, sex, haematocrit, platelet, white cell, and lymphocyte count collected on enrolment. Data was randomly split 80/20% into a training and hold-out set, respectively, with the latter not used in model development. Cross-validation and hold out set testing was used, with performance over time evaluated through a rolling window approach. Results: We included 8,100 patients recruited between 16th October 2010 and 10th December 2014. In total 2,240 (27.7%) patients were diagnosed with dengue infection. The optimised model from training data had an overall median area under the receiver operator curve (AUROC) of 0.86 (interquartile range 0.84–0.86), specificity of 0.92, sensitivity of 0.56, positive predictive value of 0.73, negative predictive value (NPV) of 0.84, and Brier score of 0.13 in predicting the final diagnosis, with similar performances in hold-out set testing (AUROC of 0.86). Model performances varied significantly over time as a function of seasonality and other factors. Incorporation of a dynamic threshold which continuously learns from recent cases resulted in a more consistent performance throughout the year (NPV >90%). Conclusion: Supervised machine learning models are able to discriminate between dengue and OFI diagnoses in patients presenting with an early undifferentiated febrile illness. These models could be of clinical utility in supporting healthcare decision-making and provide passive surveillance across dengue endemic regions. Effects of seasonality and changing disease prevalence must however be taken into account—this is of significant importance given unpredictable effects of human-induced climate change and the impact on health.
AB - Background: Symptomatic dengue infection can result in a life-threatening shock syndrome and timely diagnosis is essential. Point-of-care tests for non-structural protein 1 and IgM are used widely but performance can be limited. We developed a supervised machine learning model to predict whether patients with acute febrile illnesses had a diagnosis of dengue or other febrile illnesses (OFI). The impact of seasonality on model performance over time was examined. Methods: We analysed data from a prospective observational clinical study in Vietnam. Enrolled patients presented with an acute febrile illness of <72 h duration. A gradient boosting model (XGBoost) was used to predict final diagnosis using age, sex, haematocrit, platelet, white cell, and lymphocyte count collected on enrolment. Data was randomly split 80/20% into a training and hold-out set, respectively, with the latter not used in model development. Cross-validation and hold out set testing was used, with performance over time evaluated through a rolling window approach. Results: We included 8,100 patients recruited between 16th October 2010 and 10th December 2014. In total 2,240 (27.7%) patients were diagnosed with dengue infection. The optimised model from training data had an overall median area under the receiver operator curve (AUROC) of 0.86 (interquartile range 0.84–0.86), specificity of 0.92, sensitivity of 0.56, positive predictive value of 0.73, negative predictive value (NPV) of 0.84, and Brier score of 0.13 in predicting the final diagnosis, with similar performances in hold-out set testing (AUROC of 0.86). Model performances varied significantly over time as a function of seasonality and other factors. Incorporation of a dynamic threshold which continuously learns from recent cases resulted in a more consistent performance throughout the year (NPV >90%). Conclusion: Supervised machine learning models are able to discriminate between dengue and OFI diagnoses in patients presenting with an early undifferentiated febrile illness. These models could be of clinical utility in supporting healthcare decision-making and provide passive surveillance across dengue endemic regions. Effects of seasonality and changing disease prevalence must however be taken into account—this is of significant importance given unpredictable effects of human-induced climate change and the impact on health.
KW - climate change
KW - dengue
KW - diagnosis
KW - seasonality
KW - supervised machine learning
UR - http://www.scopus.com/inward/record.url?scp=85131263883&partnerID=8YFLogxK
U2 - 10.3389/fdgth.2022.849641
DO - 10.3389/fdgth.2022.849641
M3 - Article
C2 - 35360365
AN - SCOPUS:85131263883
SN - 2673-253X
VL - 4
JO - Frontiers in Digital Health
JF - Frontiers in Digital Health
M1 - 849641
ER -