Abstract
Invasive fungal diseases (IFDs) cause more than 1,000 deaths in hospitals and cost the health system more than AUD100m in Australia each year. The most common life-threatening IFD is aspergillosis and a patient with this IFD typically has 12 days prolonged in-patient time in hospital and an 8% mortality rate. Surveillance and detection of IFDs irrespective of the stage of diagnosis (i.e., early or late in disease) is important. We describe an application of text mining techniques, using machine learning over a range of features, to automatically detect cases of patients with IFD from the text in the reports of CT scans performed on them. We focus on detecting the presence of aspergillosis; however, we anticipate the approach to be transferable to other diseases or conditions by training the text mining component over appropriate reports. Previous systems based on language technology have been deployed for processing radiology reports and for detecting hospital-acquired infection using language-processing technology, with significant success. Our approach differs by using a purely statistical/machine-learning approach to the language technology, and by being trained and tested on data collected from a number of hospitals. We collected reports for 288 IFD and 291 control patients from three different hospitals in Melbourne, Australia: Alfred Health, Melbourne Health, and Peter MacCallum Cancer Centre. We extracted a sample of 69 IFD and 49 control patients to perform detailed analysis of the text with regard to IFD; each patient had possibly multiple scans (and associated reports), resulting in a total of 398 scan reports from IFD-positive patients and 83 scan reports from control pa-tients. We had medical experts annotate the patient-level classification on all scan reports at both sentence and report level: The annotators had to decide, for each sentence and report, whether it was positive, neutral, or negative with re-gards to IFD. We classify reports and patients as IFD positive if they contain at least one positive sentence, and as negative otherwise. We used the Weka SVM implementation and employed a variety of text- and concept-based features, including bag-of-words, punctuation, UMLS concepts and negated contexts extracted using MetaMap. We also automatically extracted high-value terms (as measured using log-likelihood ratio) and formulated multi-word concept descriptions. Our system showed Sensitivity of 0.94 and Specificity of 0.76 for classifying individual reports as being indicative of aspergillus, and 1.0 and 0.51 for classifying patients as having contracted the infection.
Original language | English |
---|---|
Title of host publication | CLEF 2012 Working Notes |
Subtitle of host publication | Working Notes for CLEF 2012 Conference |
Editors | Pamela Forner, Jussi Karlgren, Crista Womser-Hacker, Nicola Ferro |
Publisher | Rheinisch-Westfaelische Technische Hochschule Aachen |
Number of pages | 4 |
Volume | 1178 |
Publication status | Published - 2012 |
Externally published | Yes |
Event | Conference and Labs of the Evaluation Forum: Information Access Evaluation meets Multilinguality, Multimodality, and Visual Analytics - Sapienza University of Rome, Rome, Italy Duration: 17 Sep 2012 → 20 Sep 2012 Conference number: 13th http://clef2012.clef-initiative.eu/index.php |
Conference
Conference | Conference and Labs of the Evaluation Forum |
---|---|
Abbreviated title | CLEF 2012 |
Country | Italy |
City | Rome |
Period | 17/09/12 → 20/09/12 |
Internet address |
Keywords
- Biosurveillance
- Clinical reports
- Machine learning
- Text mining