Information-theoretic inference of an optimal dictionary of protein supersecondary structures

Arun S. Konagurthu, Ramanan Subramanian, Lloyd Allison, David Abramson, Maria Garcia de la Banda, Peter J. Stuckey, Arthur M. Lesk

Research output: Chapter in Book/Report/Conference proceedingChapter (Book)Researchpeer-review

1 Citation (Scopus)


We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at .

Original languageEnglish
Title of host publicationProtein Supersecondary Structures
Subtitle of host publicationMethods and Protocols
EditorsAlexander E. Kister
Place of PublicationNew York NY USA
Number of pages9
ISBN (Electronic)9781493991617
ISBN (Print)9781493991600
Publication statusPublished - 2019

Publication series

NameMethods in molecular biology (Clifton, N.J.)
PublisherSpringer Protocols (Humana Press)
ISSN (Print)1064-3745


  • Minimum message length
  • MML
  • Protein folding pattern
  • Supersecondary structure
  • Tableau representation

Cite this