TY - JOUR
T1 - Natural language processing in narrative breast radiology reporting in University Malaya Medical Centre
AU - Tan, Wee Ming
AU - Ng, Wei Lin
AU - Ganggayah, Mogana Darshini
AU - Hoe, Victor Chee Wai
AU - Rahmat, Kartini
AU - Zaini, Hana Salwani
AU - Mohd Taib, Nur Aishah
AU - Dhillon, Sarinder Kaur
N1 - Publisher Copyright:
© The Author(s) 2023.
PY - 2023/7
Y1 - 2023/7
N2 - Radiology reporting is narrative, and its content depends on the clinician’s ability to interpret the images accurately. A tertiary hospital, such as anonymous institute, focuses on writing reports narratively as part of training for medical personnel. Nevertheless, free-text reports make it inconvenient to extract information for clinical audits and data mining. Therefore, we aim to convert unstructured breast radiology reports into structured formats using natural language processing (NLP) algorithm. This study used 327 de-identified breast radiology reports from the anonymous institute. The radiologist identified the significant data elements to be extracted. Our NLP algorithm achieved 97% and 94.9% accuracy in training and testing data, respectively. Henceforth, the structured information was used to build the predictive model for predicting the value of the BIRADS category. The model based on random forest generated the highest accuracy of 92%. Our study not only fulfilled the demands of clinicians by enhancing communication between medical personnel, but it also demonstrated the usefulness of mineable structured data in yielding significant insights.
AB - Radiology reporting is narrative, and its content depends on the clinician’s ability to interpret the images accurately. A tertiary hospital, such as anonymous institute, focuses on writing reports narratively as part of training for medical personnel. Nevertheless, free-text reports make it inconvenient to extract information for clinical audits and data mining. Therefore, we aim to convert unstructured breast radiology reports into structured formats using natural language processing (NLP) algorithm. This study used 327 de-identified breast radiology reports from the anonymous institute. The radiologist identified the significant data elements to be extracted. Our NLP algorithm achieved 97% and 94.9% accuracy in training and testing data, respectively. Henceforth, the structured information was used to build the predictive model for predicting the value of the BIRADS category. The model based on random forest generated the highest accuracy of 92%. Our study not only fulfilled the demands of clinicians by enhancing communication between medical personnel, but it also demonstrated the usefulness of mineable structured data in yielding significant insights.
KW - information extraction
KW - natural language processing
KW - radiology reporting
KW - rule-based
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85172018335&partnerID=8YFLogxK
U2 - 10.1177/14604582231203763
DO - 10.1177/14604582231203763
M3 - Article
C2 - 37740904
AN - SCOPUS:85172018335
SN - 1460-4582
VL - 29
SP - 1
EP - 22
JO - Health Informatics Journal
JF - Health Informatics Journal
IS - 3
ER -