TY - JOUR
T1 - ProBAPred
T2 - Inferring protein-protein binding affinity by incorporating protein sequence and structural features
AU - Lu, Bangli
AU - Li, Chen
AU - Chen, Qingfeng
AU - Song, Jiangning
PY - 2018/6/29
Y1 - 2018/6/29
N2 - Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient (R-value=0.467), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.
AB - Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein-protein interaction allows a systematic construction of protein-protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein-protein interactions (PPIs), limited work has been conducted for estimating protein-protein binding free energy, which can provide informative real-value regression models for characterizing the protein-protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein-protein Binding Affinity Predictor), for quantitative estimation of protein-protein binding affinity. A large number of sequence and structural features, including physical-chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657kcal/mol) and the second highest correlation coefficient (R-value=0.467), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein-protein binding affinity.
KW - feature selection
KW - Protein-protein binding affinity
KW - regression model
KW - sequence-derived features
KW - structural features
UR - http://www.scopus.com/inward/record.url?scp=85049135923&partnerID=8YFLogxK
U2 - 10.1142/S0219720018500117
DO - 10.1142/S0219720018500117
M3 - Article
C2 - 29954286
AN - SCOPUS:85049135923
VL - 16
JO - Journal of Bioinformatics and Computational Biology
JF - Journal of Bioinformatics and Computational Biology
SN - 0219-7200
IS - 4
M1 - 1850011
ER -