Predicting functional impact of single amino acid polymorphisms by integrating sequence and structural features

Mingjun Wang, Hong-Bin Shen, Tatsuya Akutsu, Jiangning Song

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

Single amino acid polymorphisms (SAPs) are the most abundant form of known genetic variations associated with human diseases. It is of great interest to study the sequence-structure-function relationship underlying SAPs. In this work, we collected the human variant data from three databases and divided them into three categories, i.e. cancer somatic mutations (CSM), Mendelian disease-related variant (SVD) and neutral polymorphisms (SVP). We built support vector machine (SVM) classifiers to predict these three classes of SAPs, using the optimal features selected by a random forest algorithm. Consequently, 280 sequence-derived and structural features were initially extracted from the curated datasets from which 18 optimal candidate features were further selected by random forest. Furthermore, we performed a stepwise feature selection to select characteristic sequence and structural features that are important for predicting each SAPs class. As a result, our predictors achieved a prediction accuracy (ACC) of 84.97, 96.93, 86.98 and 88.24 , for the three classes, CSM, SVD and SVP, respectively. Performance comparison with other previously developed tools such as SIFT, SNAP and Polyphen2 indicates that our method provides a favorable performance with higher Sensitivity scores and Matthew s correlation coefficients (MCC). These results indicate that the prediction performance of SAPs classifiers can be effectively improved by feature selection. Moreover, division of SAPs into three respective categories and construction of accurate SVM-based classifiers for each class provides a practically useful way for investigating the difference between Mendelian disease-related variants and cancer somatic mutations.
Original languageEnglish
Title of host publication2011 IEEE Conference on Systems Biology 2011
EditorsL Chen, X S Zhang, Y Wang
Place of PublicationChina
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages18 - 26
Number of pages9
ISBN (Print)9781457716669
Publication statusPublished - 2011
EventInternational Conference on Computational Systems Biology (ISB) 2011 - Yindu Hotel Zhuhai, Zhuhai, China
Duration: 2 Sep 20114 Sep 2011

Conference

ConferenceInternational Conference on Computational Systems Biology (ISB) 2011
Abbreviated titleISB 2011
CountryChina
CityZhuhai
Period2/09/114/09/11

Cite this