Multi-domain evaluation framework for named entity recognition tools

Zahraa S. Abdallah, Mark James Carman, Gholamreza Haffari

    Research output: Contribution to journalArticleResearchpeer-review

    14 Citations (Scopus)

    Abstract

    Extracting structured information from unstructured text is important for the qualitative data analysis. Leveraging NLP techniques for qualitative data analysis will effectively accelerate the annotation process, allow for large-scale analysis and provide more insights into the text to improve the performance. The first step for gaining insights from the text is Named Entity Recognition (NER). A significant challenge that directly impacts the performance of the NER process is the domain diversity in qualitative data. The represented text varies according to its domain in many aspects including taxonomies, length, formality and format. In this paper we discuss and analyse the performance of state-of-the-art tools across domains to elaborate their robustness and reliability. In order to do that, we developed a standard, expandable and flexible framework to analyse and test tools performance using corpora representing text across various domains. We performed extensive analysis and comparison of tools across various domains and from various perspectives. The resulting comparison and analysis are of significant importance for providing a holistic illustration of the state-of-the-art tools.
    Original languageEnglish
    Pages (from-to)34-55
    Number of pages22
    JournalComputer Speech & Language
    Volume43
    DOIs
    Publication statusPublished - May 2017

    Keywords

    • Named entity recognition
    • Multi-domain evaluation
    • Qualitative data analysis
    • Benchmark evaluation

    Cite this