How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience

Martin Krallinger, Florian Leitner, Miguel Vazquez, David Salgado, Christophe Marcelle, Mike Tyers, Alfonso Valencia, Andrew Chatr-aryamontri

Research output: Contribution to journalArticleResearchpeer-review

Abstract

There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein-protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein-protein interaction data and PSI-MI terms referring to interaction detection methods.
Original languageEnglish
Article numberbas017
Pages (from-to)1 - 12
Number of pages12
JournalDatabase: the Journal of Biological Databases and Curation
Volume2012
DOIs
Publication statusPublished - 2012

Cite this

Krallinger, Martin ; Leitner, Florian ; Vazquez, Miguel ; Salgado, David ; Marcelle, Christophe ; Tyers, Mike ; Valencia, Alfonso ; Chatr-aryamontri, Andrew. / How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience. In: Database: the Journal of Biological Databases and Curation. 2012 ; Vol. 2012. pp. 1 - 12.
@article{6584d9c7ac284ab7abbea209f82be234,
title = "How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience",
abstract = "There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein-protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein-protein interaction data and PSI-MI terms referring to interaction detection methods.",
author = "Martin Krallinger and Florian Leitner and Miguel Vazquez and David Salgado and Christophe Marcelle and Mike Tyers and Alfonso Valencia and Andrew Chatr-aryamontri",
year = "2012",
doi = "10.1093/database/bas017",
language = "English",
volume = "2012",
pages = "1 -- 12",
journal = "Database: the Journal of Biological Databases and Curation",
issn = "1758-0463",
publisher = "Oxford University Press",

}

How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience. / Krallinger, Martin; Leitner, Florian; Vazquez, Miguel; Salgado, David; Marcelle, Christophe; Tyers, Mike; Valencia, Alfonso; Chatr-aryamontri, Andrew.

In: Database: the Journal of Biological Databases and Curation, Vol. 2012, bas017, 2012, p. 1 - 12.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - How to link ontologies and protein-protein interactions to literature: text-mining approaches and the BioCreative experience

AU - Krallinger, Martin

AU - Leitner, Florian

AU - Vazquez, Miguel

AU - Salgado, David

AU - Marcelle, Christophe

AU - Tyers, Mike

AU - Valencia, Alfonso

AU - Chatr-aryamontri, Andrew

PY - 2012

Y1 - 2012

N2 - There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein-protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein-protein interaction data and PSI-MI terms referring to interaction detection methods.

AB - There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein-protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein-protein interaction data and PSI-MI terms referring to interaction detection methods.

UR - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3309177/pdf/bas017.pdf

U2 - 10.1093/database/bas017

DO - 10.1093/database/bas017

M3 - Article

VL - 2012

SP - 1

EP - 12

JO - Database: the Journal of Biological Databases and Curation

JF - Database: the Journal of Biological Databases and Curation

SN - 1758-0463

M1 - bas017

ER -