ProBAPred: Inferring protein-protein binding affinity by incorporating protein sequence and structural features

Bangli Lu, Chen Li, Qingfeng Chen, Jiangning Song

Research output: Contribution to journalArticleResearchpeer-review

1 Citation (Scopus)

Abstract

Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient (R-value=0.467), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.

Original languageEnglish
Article number1850011
Number of pages18
JournalJournal of Bioinformatics and Computational Biology
Volume16
Issue number4
DOIs
Publication statusPublished - 29 Jun 2018

Keywords

  • feature selection
  • Protein-protein binding affinity
  • regression model
  • sequence-derived features
  • structural features

Cite this

@article{a9aae4ebc446419f8fcf02848578db6f,
title = "ProBAPred: Inferring protein-protein binding affinity by incorporating protein sequence and structural features",
abstract = "Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient (R-value=0.467), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.",
keywords = "feature selection, Protein-protein binding affinity, regression model, sequence-derived features, structural features",
author = "Bangli Lu and Chen Li and Qingfeng Chen and Jiangning Song",
year = "2018",
month = "6",
day = "29",
doi = "10.1142/S0219720018500117",
language = "English",
volume = "16",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing",
number = "4",

}

ProBAPred : Inferring protein-protein binding affinity by incorporating protein sequence and structural features. / Lu, Bangli; Li, Chen; Chen, Qingfeng; Song, Jiangning.

In: Journal of Bioinformatics and Computational Biology, Vol. 16, No. 4, 1850011, 29.06.2018.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - ProBAPred

T2 - Inferring protein-protein binding affinity by incorporating protein sequence and structural features

AU - Lu, Bangli

AU - Li, Chen

AU - Chen, Qingfeng

AU - Song, Jiangning

PY - 2018/6/29

Y1 - 2018/6/29

N2 - Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient (R-value=0.467), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.

AB - Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient (R-value=0.467), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.

KW - feature selection

KW - Protein-protein binding affinity

KW - regression model

KW - sequence-derived features

KW - structural features

UR - http://www.scopus.com/inward/record.url?scp=85049135923&partnerID=8YFLogxK

U2 - 10.1142/S0219720018500117

DO - 10.1142/S0219720018500117

M3 - Article

C2 - 29954286

AN - SCOPUS:85049135923

VL - 16

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 4

M1 - 1850011

ER -