Statistical inference of protein structural alignments using information and compression

James H. Collier, Lloyd Allison, Arthur M. Lesk, Peter J. Stuckey, Maria Garcia De La Banda, Arun S. Konagurthu

Research output: Contribution to journalArticle

Abstract

Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.

LanguageEnglish
Pages1005-1013
Number of pages9
JournalBioinformatics
Volume33
Issue number7
DOIs
StatePublished - 1 Apr 2017

Cite this

@article{a979edb4d40240ec93d1614bc40448c3,
title = "Statistical inference of protein structural alignments using information and compression",
abstract = "Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.",
author = "Collier, {James H.} and Lloyd Allison and Lesk, {Arthur M.} and Stuckey, {Peter J.} and {Garcia De La Banda}, Maria and Konagurthu, {Arun S.}",
year = "2017",
month = "4",
day = "1",
doi = "10.1093/bioinformatics/btw757",
language = "English",
volume = "33",
pages = "1005--1013",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford Univ Press",
number = "7",

}

Statistical inference of protein structural alignments using information and compression. / Collier, James H.; Allison, Lloyd; Lesk, Arthur M.; Stuckey, Peter J.; Garcia De La Banda, Maria; Konagurthu, Arun S.

In: Bioinformatics, Vol. 33, No. 7, 01.04.2017, p. 1005-1013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Statistical inference of protein structural alignments using information and compression

AU - Collier,James H.

AU - Allison,Lloyd

AU - Lesk,Arthur M.

AU - Stuckey,Peter J.

AU - Garcia De La Banda,Maria

AU - Konagurthu,Arun S.

PY - 2017/4/1

Y1 - 2017/4/1

N2 - Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.

AB - Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.

UR - http://www.scopus.com/inward/record.url?scp=85019072762&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw757

DO - 10.1093/bioinformatics/btw757

M3 - Article

VL - 33

SP - 1005

EP - 1013

JO - Bioinformatics

T2 - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 7

ER -