TY - JOUR
T1 - Clinical named entity recognition and relation extraction using natural language processing of medical free text
T2 - A systematic review
AU - Fraile Navarro, David
AU - Ijaz, Kiran
AU - Rezazadegan, Dana
AU - Rahimi-Ardabili, Hania
AU - Dras, Mark
AU - Coiera, Enrico
AU - Berkovsky, Shlomo
N1 - Funding Information:
David Fraile Navarro is supported by an International Macquarie University Research Excellence Scholarship “iMQRES”.
Publisher Copyright:
© 2023
PY - 2023/9
Y1 - 2023/9
N2 - Background: Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments. Methods: We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries). Results: We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were “problem”, “test” and “treatment”. 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool. Discussion: Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
AB - Background: Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments. Methods: We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries). Results: We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were “problem”, “test” and “treatment”. 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool. Discussion: Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
UR - http://www.scopus.com/inward/record.url?scp=85161478506&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2023.105122
DO - 10.1016/j.ijmedinf.2023.105122
M3 - Review Article
C2 - 37295138
AN - SCOPUS:85161478506
SN - 1386-5056
VL - 177
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
M1 - 105122
ER -