PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework

Jiangning Song, Fuyi Li, Kazuhiro Takemoto, Gholamreza Haffari, Tatsuya Akutsu, Kuo Chen Chou, Geoffrey I. Webb

Research output: Contribution to journalArticleResearchpeer-review

71 Citations (Scopus)

Abstract

Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence–structure–function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence–structure–function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.

Original languageEnglish
Pages (from-to)125-137
Number of pages13
JournalJournal of Theoretical Biology
Volume443
DOIs
Publication statusPublished - 14 Apr 2018

Keywords

  • Bioinformatics
  • Enzyme catalytic residues
  • Functional annotation
  • Machine learning
  • Pattern recognition
  • Sequence analysis
  • Sequence–structure–function relationship

Cite this

@article{5fc65a7d5d284eae9b7a577699d3e4ba,
title = "PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework",
abstract = "Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence–structure–function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence–structure–function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.",
keywords = "Bioinformatics, Enzyme catalytic residues, Functional annotation, Machine learning, Pattern recognition, Sequence analysis, Sequence–structure–function relationship",
author = "Jiangning Song and Fuyi Li and Kazuhiro Takemoto and Gholamreza Haffari and Tatsuya Akutsu and Chou, {Kuo Chen} and Webb, {Geoffrey I.}",
year = "2018",
month = "4",
day = "14",
doi = "10.1016/j.jtbi.2018.01.023",
language = "English",
volume = "443",
pages = "125--137",
journal = "Journal of Theoretical Biology",
issn = "0022-5193",
publisher = "Elsevier",

}

PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. / Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo Chen; Webb, Geoffrey I.

In: Journal of Theoretical Biology, Vol. 443, 14.04.2018, p. 125-137.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework

AU - Song, Jiangning

AU - Li, Fuyi

AU - Takemoto, Kazuhiro

AU - Haffari, Gholamreza

AU - Akutsu, Tatsuya

AU - Chou, Kuo Chen

AU - Webb, Geoffrey I.

PY - 2018/4/14

Y1 - 2018/4/14

N2 - Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence–structure–function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence–structure–function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.

AB - Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence–structure–function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence–structure–function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations.

KW - Bioinformatics

KW - Enzyme catalytic residues

KW - Functional annotation

KW - Machine learning

KW - Pattern recognition

KW - Sequence analysis

KW - Sequence–structure–function relationship

UR - http://www.scopus.com/inward/record.url?scp=85042362561&partnerID=8YFLogxK

U2 - 10.1016/j.jtbi.2018.01.023

DO - 10.1016/j.jtbi.2018.01.023

M3 - Article

AN - SCOPUS:85042362561

VL - 443

SP - 125

EP - 137

JO - Journal of Theoretical Biology

JF - Journal of Theoretical Biology

SN - 0022-5193

ER -