TY - JOUR
T1 - Automated generation of synoptic reports from narrative pathology reports in University Malaya Medical Centre using natural language processing
AU - Tan, Wee-Ming
AU - Teoh, Kean-Hooi
AU - Ganggayah, Mogana Darshini
AU - Taib, Nur Aishah
AU - Zaini, Hana Salwani
AU - Dhillon, Sarinder Kaur
N1 - Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/4
Y1 - 2022/4
N2 - Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians’ needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.
AB - Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians’ needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.
KW - information extraction
KW - natural language processing
KW - pathology reporting
KW - rule based
KW - synoptic reporting
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85128318558&partnerID=8YFLogxK
U2 - 10.3390/diagnostics12040879
DO - 10.3390/diagnostics12040879
M3 - Article
C2 - 35453927
AN - SCOPUS:85128318558
SN - 2075-4418
VL - 12
JO - Diagnostics
JF - Diagnostics
IS - 4
M1 - 879
ER -