A corpus of tables in full-text biomedical research publications

Tatyana Shmanina, Ingrid Zukerman, Ai Lee Cheam, Thomas Bochynek, Lawrence Cavedon

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    Abstract

    The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.
    Original languageEnglish
    Title of host publicationFifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016)
    Subtitle of host publicationProceedings of the Workshop, December 11-16 2016, Osaka, Japan
    EditorsSophia Ananiadou, Riza Batista-Navarro, Kevin Bretonnel Cohen, Dina Demner-Fushman, Paul Thompson
    Pages81-90
    Number of pages10
    Publication statusPublished - 2016
    EventFifth Workshop on Building and Evaluating Resources for Biomedical Text Mining - Osaka, Japan
    Duration: 11 Dec 201616 Dec 2016

    Conference

    ConferenceFifth Workshop on Building and Evaluating Resources for Biomedical Text Mining
    CountryJapan
    CityOsaka
    Period11/12/1616/12/16

    Cite this

    Shmanina, T., Zukerman, I., Cheam, A. L., Bochynek, T., & Cavedon, L. (2016). A corpus of tables in full-text biomedical research publications. In S. Ananiadou, R. Batista-Navarro, K. B. Cohen, D. Demner-Fushman, & P. Thompson (Eds.), Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016): Proceedings of the Workshop, December 11-16 2016, Osaka, Japan (pp. 81-90)
    Shmanina, Tatyana ; Zukerman, Ingrid ; Cheam, Ai Lee ; Bochynek, Thomas ; Cavedon, Lawrence. / A corpus of tables in full-text biomedical research publications. Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016): Proceedings of the Workshop, December 11-16 2016, Osaka, Japan. editor / Sophia Ananiadou ; Riza Batista-Navarro ; Kevin Bretonnel Cohen ; Dina Demner-Fushman ; Paul Thompson. 2016. pp. 81-90
    @inproceedings{44f7b3347fff4937acc85dad02f9740d,
    title = "A corpus of tables in full-text biomedical research publications",
    abstract = "The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.",
    author = "Tatyana Shmanina and Ingrid Zukerman and Cheam, {Ai Lee} and Thomas Bochynek and Lawrence Cavedon",
    year = "2016",
    language = "English",
    isbn = "9784879747198",
    pages = "81--90",
    editor = "Sophia Ananiadou and Riza Batista-Navarro and Cohen, {Kevin Bretonnel} and Dina Demner-Fushman and Paul Thompson",
    booktitle = "Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016)",

    }

    Shmanina, T, Zukerman, I, Cheam, AL, Bochynek, T & Cavedon, L 2016, A corpus of tables in full-text biomedical research publications. in S Ananiadou, R Batista-Navarro, KB Cohen, D Demner-Fushman & P Thompson (eds), Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016): Proceedings of the Workshop, December 11-16 2016, Osaka, Japan. pp. 81-90, Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining, Osaka, Japan, 11/12/16.

    A corpus of tables in full-text biomedical research publications. / Shmanina, Tatyana; Zukerman, Ingrid; Cheam, Ai Lee; Bochynek, Thomas; Cavedon, Lawrence.

    Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016): Proceedings of the Workshop, December 11-16 2016, Osaka, Japan. ed. / Sophia Ananiadou; Riza Batista-Navarro; Kevin Bretonnel Cohen; Dina Demner-Fushman; Paul Thompson. 2016. p. 81-90.

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

    TY - GEN

    T1 - A corpus of tables in full-text biomedical research publications

    AU - Shmanina, Tatyana

    AU - Zukerman, Ingrid

    AU - Cheam, Ai Lee

    AU - Bochynek, Thomas

    AU - Cavedon, Lawrence

    PY - 2016

    Y1 - 2016

    N2 - The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.

    AB - The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.

    UR - http://www.nactem.ac.uk/biotxtm2016/index.php

    M3 - Conference Paper

    SN - 9784879747198

    SP - 81

    EP - 90

    BT - Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016)

    A2 - Ananiadou, Sophia

    A2 - Batista-Navarro, Riza

    A2 - Cohen, Kevin Bretonnel

    A2 - Demner-Fushman, Dina

    A2 - Thompson, Paul

    ER -

    Shmanina T, Zukerman I, Cheam AL, Bochynek T, Cavedon L. A corpus of tables in full-text biomedical research publications. In Ananiadou S, Batista-Navarro R, Cohen KB, Demner-Fushman D, Thompson P, editors, Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2016): Proceedings of the Workshop, December 11-16 2016, Osaka, Japan. 2016. p. 81-90