Information-theoretic inference of an optimal dictionary of protein supersecondary structures

Arun S. Konagurthu, Ramanan Subramanian, Lloyd Allison, David Abramson, Maria Garcia de la Banda, Peter J. Stuckey, Arthur M. Lesk

Research output: Chapter in Book/Report/Conference proceedingChapter (Book)Researchpeer-review

Abstract

We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .

Original languageEnglish
Title of host publicationProtein Supersecondary Structures
Subtitle of host publicationMethods and Protocols
EditorsAlexander E. Kister
Place of PublicationNew York NY USA
PublisherSpringer
Chapter6
Pages123-131
Number of pages9
Edition2nd
ISBN (Electronic)9781493991617
ISBN (Print)9781493991600
DOIs
Publication statusPublished - 2019

Publication series

NameMethods in molecular biology (Clifton, N.J.)
PublisherSpringer Protocols (Humana Press)
Volume1958
ISSN (Print)1064-3745

Keywords

  • Minimum message length
  • MML
  • Protein folding pattern
  • Supersecondary structure
  • Tableau representation

Cite this

Konagurthu, A. S., Subramanian, R., Allison, L., Abramson, D., de la Banda, M. G., Stuckey, P. J., & Lesk, A. M. (2019). Information-theoretic inference of an optimal dictionary of protein supersecondary structures. In A. E. Kister (Ed.), Protein Supersecondary Structures: Methods and Protocols (2nd ed., pp. 123-131). (Methods in molecular biology (Clifton, N.J.); Vol. 1958). New York NY USA: Springer. https://doi.org/10.1007/978-1-4939-9161-7_6
Konagurthu, Arun S. ; Subramanian, Ramanan ; Allison, Lloyd ; Abramson, David ; de la Banda, Maria Garcia ; Stuckey, Peter J. ; Lesk, Arthur M. / Information-theoretic inference of an optimal dictionary of protein supersecondary structures. Protein Supersecondary Structures: Methods and Protocols. editor / Alexander E. Kister. 2nd. ed. New York NY USA : Springer, 2019. pp. 123-131 (Methods in molecular biology (Clifton, N.J.)).
@inbook{9956bd613eb74cbbb4cb2db50c66c240,
title = "Information-theoretic inference of an optimal dictionary of protein supersecondary structures",
abstract = "We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .",
keywords = "Minimum message length, MML, Protein folding pattern, Supersecondary structure, Tableau representation",
author = "Konagurthu, {Arun S.} and Ramanan Subramanian and Lloyd Allison and David Abramson and {de la Banda}, {Maria Garcia} and Stuckey, {Peter J.} and Lesk, {Arthur M.}",
year = "2019",
doi = "10.1007/978-1-4939-9161-7_6",
language = "English",
isbn = "9781493991600",
series = "Methods in molecular biology (Clifton, N.J.)",
publisher = "Springer",
pages = "123--131",
editor = "{E. Kister}, Alexander",
booktitle = "Protein Supersecondary Structures",
edition = "2nd",

}

Konagurthu, AS, Subramanian, R, Allison, L, Abramson, D, de la Banda, MG, Stuckey, PJ & Lesk, AM 2019, Information-theoretic inference of an optimal dictionary of protein supersecondary structures. in A E. Kister (ed.), Protein Supersecondary Structures: Methods and Protocols. 2nd edn, Methods in molecular biology (Clifton, N.J.), vol. 1958, Springer, New York NY USA, pp. 123-131. https://doi.org/10.1007/978-1-4939-9161-7_6

Information-theoretic inference of an optimal dictionary of protein supersecondary structures. / Konagurthu, Arun S.; Subramanian, Ramanan; Allison, Lloyd; Abramson, David; de la Banda, Maria Garcia; Stuckey, Peter J.; Lesk, Arthur M.

Protein Supersecondary Structures: Methods and Protocols. ed. / Alexander E. Kister. 2nd. ed. New York NY USA : Springer, 2019. p. 123-131 (Methods in molecular biology (Clifton, N.J.); Vol. 1958).

Research output: Chapter in Book/Report/Conference proceedingChapter (Book)Researchpeer-review

TY - CHAP

T1 - Information-theoretic inference of an optimal dictionary of protein supersecondary structures

AU - Konagurthu, Arun S.

AU - Subramanian, Ramanan

AU - Allison, Lloyd

AU - Abramson, David

AU - de la Banda, Maria Garcia

AU - Stuckey, Peter J.

AU - Lesk, Arthur M.

PY - 2019

Y1 - 2019

N2 - We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .

AB - We recently developed an unsupervised Bayesian inference methodology to automatically infer a dictionary of protein supersecondary structures (Subramanian et al., IEEE data compression conference proceedings (DCC), 340-349, 2017). Specifically, this methodology uses the information-theoretic framework of minimum message length (MML) criterion for hypothesis selection (Wallace, Statistical and inductive inference by minimum message length, Springer Science & Business Media, New York, 2005). The best dictionary of supersecondary structures is the one that yields the most (lossless) compression on the source collection of folding patterns represented as tableaux (matrix representations that capture the essence of protein folding patterns (Lesk, J Mol Graph. 13:159-164, 1995). This book chapter outlines our MML methodology for inferring the supersecondary structure dictionary. The inferred dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.html .

KW - Minimum message length

KW - MML

KW - Protein folding pattern

KW - Supersecondary structure

KW - Tableau representation

U2 - 10.1007/978-1-4939-9161-7_6

DO - 10.1007/978-1-4939-9161-7_6

M3 - Chapter (Book)

SN - 9781493991600

T3 - Methods in molecular biology (Clifton, N.J.)

SP - 123

EP - 131

BT - Protein Supersecondary Structures

A2 - E. Kister, Alexander

PB - Springer

CY - New York NY USA

ER -

Konagurthu AS, Subramanian R, Allison L, Abramson D, de la Banda MG, Stuckey PJ et al. Information-theoretic inference of an optimal dictionary of protein supersecondary structures. In E. Kister A, editor, Protein Supersecondary Structures: Methods and Protocols. 2nd ed. New York NY USA: Springer. 2019. p. 123-131. (Methods in molecular biology (Clifton, N.J.)). https://doi.org/10.1007/978-1-4939-9161-7_6