Improving the accuracy of predicting disulfide connectivity by feature selection

Lin Zhu, Jie Yang, Jiangning Song, Kuo -Chen Chou, Hong -Bin Shen

Research output: Contribution to journalArticleResearchpeer-review

44 Citations (Scopus)

Abstract

Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins.
Original languageEnglish
Pages (from-to)1478 - 1485
Number of pages8
JournalJournal of Computational Chemistry
Volume31
Issue number7
DOIs
Publication statusPublished - 2010
Externally publishedYes

Cite this