Predicting disordered regions in proteins based on decision trees of reduced amino acid composition

Pengfei Han, Xiuzhen Zhang, Raymond S. Norton, Zhi Ping Feng

Research output: Contribution to journalReview ArticleResearchpeer-review

6 Citations (Scopus)

Abstract

Intrinsically unstructured proteins (IUPs) are proteins lacking a fixed three dimensional structure or containing long disordered regions. IUPs play an important role in biology and disease. Identifying disordered regions in protein sequences can provide useful information on protein structure and function, and can assist high-throughput protein structure determination. In this paper we present a system for predicting disordered regions in proteins based on decision trees and reduced amino acid composition. Concise rules based on biochemical properties of amino acid side chains are generated for prediction. Coarser information extracted from the composition of amino acids can not only improve the prediction accuracy but also increase the learning efficiency. In cross-validation tests, with four groups of reduced amino acid composition, our system can achieve a recall of 80% at a 13% false positive rate for predicting disordered regions, and the overall accuracy can reach 83.4%. This prediction accuracy is comparable to most, and better than some, existing predictors. Advantages of our approach are high prediction accuracy for long disordered regions and efficiency for large-scale sequence analysis. Our software is freely available for academic use upon request.

Original languageEnglish
Pages (from-to)1723-1734
Number of pages12
JournalJournal of Computational Biology
Volume13
Issue number10
DOIs
Publication statusPublished - Dec 2006
Externally publishedYes

Keywords

  • Decision tree
  • Disordered region
  • Intrinsically unstructured proteins
  • Reduced amino acid composition

Cite this