Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing

Michelle R Ananda-Rajah, David Martinez, Monica Anne Slavin, Lawrence Cavedon, Michael Joseph Dooley, Allen Cheuk-Seng Cheng, Karin A Thursky

Research output: Contribution to journalArticleResearchpeer-review

7 Citations (Scopus)

Abstract

Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs. Patients and Methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC). Results: For the development subset, sensitivity/specificity was 91 (95 CI 86 to 94 )/79 (95 CI 71 to 84 ) and ROC area was 0.92 (95 CI 89 to 94 ). Of 25 (5.6 ) missed notifications, only 4 (0.9 ) reports were regarded as clinically significant. Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.
Original languageEnglish
Article numbere107797
Number of pages8
JournalPLoS ONE
Volume9
Issue number9
DOIs
Publication statusPublished - 2014

Cite this

@article{414bf2be919d4825979bedc1bedc9c02,
title = "Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing",
abstract = "Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs. Patients and Methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC). Results: For the development subset, sensitivity/specificity was 91 (95 CI 86 to 94 )/79 (95 CI 71 to 84 ) and ROC area was 0.92 (95 CI 89 to 94 ). Of 25 (5.6 ) missed notifications, only 4 (0.9 ) reports were regarded as clinically significant. Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.",
author = "Ananda-Rajah, {Michelle R} and David Martinez and Slavin, {Monica Anne} and Lawrence Cavedon and Dooley, {Michael Joseph} and Cheng, {Allen Cheuk-Seng} and Thursky, {Karin A}",
year = "2014",
doi = "10.1371/journal.pone.0107797",
language = "English",
volume = "9",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing. / Ananda-Rajah, Michelle R; Martinez, David; Slavin, Monica Anne; Cavedon, Lawrence; Dooley, Michael Joseph; Cheng, Allen Cheuk-Seng; Thursky, Karin A.

In: PLoS ONE, Vol. 9, No. 9, e107797, 2014.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Facilitating surveillance of pulmonary invasive mold diseases in patients with haematological malignancies by screening computed tomography reports using natural language processing

AU - Ananda-Rajah, Michelle R

AU - Martinez, David

AU - Slavin, Monica Anne

AU - Cavedon, Lawrence

AU - Dooley, Michael Joseph

AU - Cheng, Allen Cheuk-Seng

AU - Thursky, Karin A

PY - 2014

Y1 - 2014

N2 - Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs. Patients and Methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC). Results: For the development subset, sensitivity/specificity was 91 (95 CI 86 to 94 )/79 (95 CI 71 to 84 ) and ROC area was 0.92 (95 CI 89 to 94 ). Of 25 (5.6 ) missed notifications, only 4 (0.9 ) reports were regarded as clinically significant. Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.

AB - Purpose: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs. Patients and Methods: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC). Results: For the development subset, sensitivity/specificity was 91 (95 CI 86 to 94 )/79 (95 CI 71 to 84 ) and ROC area was 0.92 (95 CI 89 to 94 ). Of 25 (5.6 ) missed notifications, only 4 (0.9 ) reports were regarded as clinically significant. Conclusion: CT reports are a readily available and timely resource that may be exploited by NLP to facilitate continuous prospective IMD surveillance with translational benefits beyond surveillance alone.

UR - http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0107797

U2 - 10.1371/journal.pone.0107797

DO - 10.1371/journal.pone.0107797

M3 - Article

VL - 9

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 9

M1 - e107797

ER -