Functional insights from computational modeling of orphan proteins expressed in a microbial community

Korin E. Wheeler, Adam Zemla, Yongqin Jiao, Daniela S.Aliaga Goltsman, Steven W. Singer, Jillian F. Banfield, Michael P. Thelen

Research output: Contribution to journalArticleResearchpeer-review


Environmental genomics and proteomics data are heavily populated with proteins that are not homologous to experimentally characterized proteins. We approached this problematic area by investigating a natural microbial community from a highly constrained niche in which critical roles are likely carried out by proteins of unknown function (ORFans). Based on several criteria, these proteins were not statistically similar to any protein sequences in the SwissProt database. We selected a target set of 545 ORFans and weakly annotated proteins expressed by the dominant bacterial member of the community, Leptospirillum Group II, and used an automated modeling system (AS2TS) incorporated with other computational tools to predict structures. This generated 484 models, 89% of the target set. Structure-based superfamilies, general functional categorizations, and specifi c gene ontology (GO) functions were predicted for 424, 386, and 117 ORFans, respectively. Structural predictions and classifications were integrated into a manually curated database, outlining in silico calculations and available proteomic data for each protein. This analysis facilitated the development of experimentally testable hypotheses for several enigmatic proteins, including confident predictions of copper transport proteins and cyclic diguanylate signaling proteins. As DNA sequencing of natural organisms rapidly expands, this computational structure-function approach can be applied to guide experimental testing of the structure and function of challenging ORFans.

Original languageEnglish
Pages (from-to)266-274
Number of pages9
JournalJournal of Proteomics & Bioinformatics
Issue number9
Publication statusPublished - 2010
Externally publishedYes


  • Orfan
  • Proteins of unknown function
  • SCOP
  • Structural modeling
  • Superfamily

Cite this