Challenges in information extraction from tables in biomedical research publications: A dataset analysis

Tatyana Shmanina, Lawrence Cavedon, Ingrid Zukerman

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    Abstract

    We present a study of a dataset of tables from biomedical research publications. Our aim is to identify characteristics of biomedical tables that pose challenges for the task of extracting information from tables, and to determine which parts of research papers typically contain information that is useful for this task. Our results indicate that biomedical tables are hard to interpret without their source papers due to the brevity of the entries in the tables. In many cases, unstructured text segments, such as table titles, footnotes and non-table prose discussing a table, are required to interpret the table's entries.

    Original languageEnglish
    Title of host publicationAustralasian Language Technology Association Workshop 2014 - Proceedings of the Workshop (ALTA)
    EditorsGabriela Ferraro, Stephen Wan
    Place of PublicationStroudsburg PA USA
    PublisherAssociation for Computational Linguistics (ACL)
    Pages118-122
    Number of pages5
    Publication statusPublished - 2014
    EventAustralasian Language Technology Association Workshop 2014 - RMIT, Melbourne, Australia
    Duration: 26 Nov 201428 Nov 2014
    Conference number: 12th
    https://www.aclweb.org/anthology/events/alta-2014/ (Proceedings)

    Conference

    ConferenceAustralasian Language Technology Association Workshop 2014
    Abbreviated titleALTAW 2014
    Country/TerritoryAustralia
    CityMelbourne
    Period26/11/1428/11/14
    OtherALTA 2014 will be held in conjuction with the 19th Australasian Document Computing Symposium 2014 (ADCS 2014).
    Internet address

    Cite this