TY - JOUR
T1 - Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing
AU - Ananda-Rajah, Michelle R
AU - Martinez, David
AU - Slavin, Monica Anne
AU - Cavedon, Lawrence
AU - Dooley, Michael Joseph
AU - Cheng, Allen Cheuk-Seng
AU - Thursky, Karin A
PY - 2014
Y1 - 2014
N2 - Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs.
Patients and Methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC).
Results: For the development subset, sensitivity/specificity was 91 (95 CI 86 to 94 )/79 (95 CI 71 to 84 ) and ROC area was 0.92 (95 CI 89 to 94 ). Of 25 (5.6 ) missed notifications, only 4 (0.9 ) reports were regarded as clinically significant.
Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.
AB - Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs.
Patients and Methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC).
Results: For the development subset, sensitivity/specificity was 91 (95 CI 86 to 94 )/79 (95 CI 71 to 84 ) and ROC area was 0.92 (95 CI 89 to 94 ). Of 25 (5.6 ) missed notifications, only 4 (0.9 ) reports were regarded as clinically significant.
Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.
UR - http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107797
U2 - 10.1371/journal.pone.0107797
DO - 10.1371/journal.pone.0107797
M3 - Article
SN - 1932-6203
VL - 9
JO - PLoS ONE
JF - PLoS ONE
IS - 9
M1 - e107797
ER -