TY - JOUR
T1 - Universal architectural concepts underlying protein folding patterns
AU - Konagurthu, Arun S.
AU - Subramanian, Ramanan
AU - Allison, Lloyd
AU - Abramson, David
AU - Stuckey, Peter J.
AU - Garcia de la Banda, Maria
AU - Lesk, Arthur M.
N1 - Funding Information:
This research is funded by an Australian Research Council (ARC) Discovery Project grant (DP150100894).
Funding Information:
We thank Research Computing Centre, University of Queensland, for the High-Performance Cluster Infrastructure that supported this project over the last 3 years. AL thanks the Medical Research Council Laboratory of Molecular Biology for their hospitality during his sabbatical year. We thank Sureshkumar Balasubramanian for proofreading this work.
Publisher Copyright:
© Copyright © 2021 Konagurthu, Subramanian, Allison, Abramson, Stuckey, Garcia de la Banda and Lesk.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/4/30
Y1 - 2021/4/30
N2 - What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
AB - What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
KW - architectural concepts
KW - folding pattern
KW - information theory
KW - lossless compression
KW - protein-building blocks
KW - structural motifs
UR - http://www.scopus.com/inward/record.url?scp=85106057592&partnerID=8YFLogxK
U2 - 10.3389/fmolb.2020.612920
DO - 10.3389/fmolb.2020.612920
M3 - Article
C2 - 33996891
AN - SCOPUS:85106057592
SN - 2296-889X
VL - 7
JO - Frontiers in Molecular Biosciences
JF - Frontiers in Molecular Biosciences
M1 - 612920
ER -