Predicting disordered regions in proteins based on decision trees of reduced amino acid composition

Pengfei Han, Xiuzhen Zhang, Raymond S. Norton, Zhi Ping Feng

Research output: Contribution to journalArticleResearchpeer-review

5 Citations (Scopus)


Intrinsically unstructured proteins (IUPs) are proteins lacking a fixed three-dimensional structure or containing long disordered regions. IUPs play an important role in biology and disease. Identifying disordered regions in protein sequences can provide useful information on protein structure and function, and can assist high-throughput protein structure determination. In this paper, we present a system for predicting disordered regions in proteins based on decision trees and reduced amino acid composition. Concise rules based on biochemical properties of amino acid side chains are generated for prediction. Coarser information extracted from the composition of amino acids cannot only improve the prediction accuracy, but can also increase the learning efficiency. In cross-validation tests, with four groups of reduced amino acid composition, our system can achieve a recall of 80% at a 13% false positive rate for predicting disordered regions, and the overall accuracy can reach 83.4%. This prediction accuracy is comparable to most, and better than some, existing predictors. Advantages of our approach are high prediction accuracy for long disordered regions and efficiency for large-scale sequence analysis. Our software is freely available for academic use upon request.

Original languageEnglish
Pages (from-to)1579-1590
Number of pages12
JournalJournal of Computational Biology
Issue number9
Publication statusPublished - Nov 2006
Externally publishedYes


  • Decision tree
  • Disordered region
  • Intrinsically unstructured proteins
  • Reduced amino acid composition

Cite this