Biosurveillance for invasive fungal infections via text mining

David Martinez, Hanna Suominen, Michelle Ananda-Rajah, Lawrence Cavedon

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

Abstract

Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hospitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mortality rate. Surveillance and detection of IFDs irrespective of the stage of diagnosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learning over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on detecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining component over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant success. Our approach differs by using a purely statistical/machine-learning approach to the language technology, and by being trained and tested on data collected from a number of hospitals. We collected reports for 288 IFD and 291 control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control patients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extracted high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of aspergillus, and 1.0 and 0.51 for classifying patients as having contracted the infection.

Original languageEnglish
Title of host publicationCLEF 2012 Working Notes
Subtitle of host publicationWorking Notes for CLEF 2012 Conference
EditorsPamela Forner, Jussi Karlgren, Crista Womser-Hacker, Nicola Ferro
PublisherCEUR Workshop Proceedings
Number of pages4
Volume1178
Publication statusPublished - 2012
Externally publishedYes
EventConference and Labs of the Evaluation Forum: Information Access Evaluation meets Multilinguality, Multimodality, and Visual Analytics - Sapienza University of Rome, Rome, Italy
Duration: 17 Sep 201220 Sep 2012
Conference number: 13th
http://clef2012.clef-initiative.eu/index.php

Conference

ConferenceConference and Labs of the Evaluation Forum
Abbreviated titleCLEF 2012
CountryItaly
CityRome
Period17/09/1220/09/12
Internet address

Keywords

  • Biosurveillance
  • Clinical reports
  • Machine learning
  • Text mining

Cite this

Martinez, D., Suominen, H., Ananda-Rajah, M., & Cavedon, L. (2012). Biosurveillance for invasive fungal infections via text mining. In P. Forner, J. Karlgren, C. Womser-Hacker, & N. Ferro (Eds.), CLEF 2012 Working Notes: Working Notes for CLEF 2012 Conference (Vol. 1178). CEUR Workshop Proceedings.
Martinez, David ; Suominen, Hanna ; Ananda-Rajah, Michelle ; Cavedon, Lawrence. / Biosurveillance for invasive fungal infections via text mining. CLEF 2012 Working Notes: Working Notes for CLEF 2012 Conference. editor / Pamela Forner ; Jussi Karlgren ; Crista Womser-Hacker ; Nicola Ferro. Vol. 1178 CEUR Workshop Proceedings, 2012.
@inproceedings{7d95ba8fbe1a4b6aabccaf5d551c5bdf,
title = "Biosurveillance for invasive fungal infections via text mining",
abstract = "Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hospitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8{\%} mortality rate. Surveillance and detection of IFDs irrespective of the stage of diagnosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learning over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on detecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining component over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant success. Our approach differs by using a purely statistical/machine-learning approach to the language technology, and by being trained and tested on data collected from a number of hospitals. We collected reports for 288 IFD and 291 control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control patients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extracted high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of aspergillus, and 1.0 and 0.51 for classifying patients as having contracted the infection.",
keywords = "Biosurveillance, Clinical reports, Machine learning, Text mining",
author = "David Martinez and Hanna Suominen and Michelle Ananda-Rajah and Lawrence Cavedon",
year = "2012",
language = "English",
volume = "1178",
editor = "Pamela Forner and Jussi Karlgren and Crista Womser-Hacker and Nicola Ferro",
booktitle = "CLEF 2012 Working Notes",
publisher = "CEUR Workshop Proceedings",

}

Martinez, D, Suominen, H, Ananda-Rajah, M & Cavedon, L 2012, Biosurveillance for invasive fungal infections via text mining. in P Forner, J Karlgren, C Womser-Hacker & N Ferro (eds), CLEF 2012 Working Notes: Working Notes for CLEF 2012 Conference. vol. 1178, CEUR Workshop Proceedings, Conference and Labs of the Evaluation Forum, Rome, Italy, 17/09/12.

Biosurveillance for invasive fungal infections via text mining. / Martinez, David; Suominen, Hanna; Ananda-Rajah, Michelle; Cavedon, Lawrence.

CLEF 2012 Working Notes: Working Notes for CLEF 2012 Conference. ed. / Pamela Forner; Jussi Karlgren; Crista Womser-Hacker; Nicola Ferro. Vol. 1178 CEUR Workshop Proceedings, 2012.

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

TY - GEN

T1 - Biosurveillance for invasive fungal infections via text mining

AU - Martinez, David

AU - Suominen, Hanna

AU - Ananda-Rajah, Michelle

AU - Cavedon, Lawrence

PY - 2012

Y1 - 2012

N2 - Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hospitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mortality rate. Surveillance and detection of IFDs irrespective of the stage of diagnosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learning over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on detecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining component over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant success. Our approach differs by using a purely statistical/machine-learning approach to the language technology, and by being trained and tested on data collected from a number of hospitals. We collected reports for 288 IFD and 291 control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control patients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extracted high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of aspergillus, and 1.0 and 0.51 for classifying patients as having contracted the infection.

AB - Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hospitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mortality rate. Surveillance and detection of IFDs irrespective of the stage of diagnosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learning over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on detecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining component over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant success. Our approach differs by using a purely statistical/machine-learning approach to the language technology, and by being trained and tested on data collected from a number of hospitals. We collected reports for 288 IFD and 291 control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control patients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extracted high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of aspergillus, and 1.0 and 0.51 for classifying patients as having contracted the infection.

KW - Biosurveillance

KW - Clinical reports

KW - Machine learning

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=84922022489&partnerID=8YFLogxK

M3 - Conference Paper

AN - SCOPUS:84922022489

VL - 1178

BT - CLEF 2012 Working Notes

A2 - Forner, Pamela

A2 - Karlgren, Jussi

A2 - Womser-Hacker, Crista

A2 - Ferro, Nicola

PB - CEUR Workshop Proceedings

ER -

Martinez D, Suominen H, Ananda-Rajah M, Cavedon L. Biosurveillance for invasive fungal infections via text mining. In Forner P, Karlgren J, Womser-Hacker C, Ferro N, editors, CLEF 2012 Working Notes: Working Notes for CLEF 2012 Conference. Vol. 1178. CEUR Workshop Proceedings. 2012