Universal architectural concepts underlying protein folding patterns

Arun S. Konagurthu, Ramanan Subramanian, Lloyd Allison, David Abramson, Peter J. Stuckey, Maria Garcia de la Banda, Arthur M. Lesk

Research output: Contribution to journalArticleResearchpeer-review

8 Citations (Scopus)


What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.

Original languageEnglish
Article number612920
Number of pages19
JournalFrontiers in Molecular Biosciences
Publication statusPublished - 30 Apr 2021


  • architectural concepts
  • folding pattern
  • information theory
  • lossless compression
  • protein-building blocks
  • structural motifs

Cite this