Statistical inference of protein structural alignments using information and compression

James H. Collier, Lloyd Allison, Arthur M. Lesk, Peter J. Stuckey, Maria Garcia De La Banda, Arun S. Konagurthu

    Research output: Contribution to journalArticleResearchpeer-review

    Abstract

    Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.

    LanguageEnglish
    Pages1005-1013
    Number of pages9
    JournalBioinformatics
    Volume33
    Issue number7
    DOIs
    StatePublished - 1 Apr 2017

    Cite this

    @article{a979edb4d40240ec93d1614bc40448c3,
    title = "Statistical inference of protein structural alignments using information and compression",
    abstract = "Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.",
    author = "Collier, {James H.} and Lloyd Allison and Lesk, {Arthur M.} and Stuckey, {Peter J.} and {Garcia De La Banda}, Maria and Konagurthu, {Arun S.}",
    year = "2017",
    month = "4",
    day = "1",
    doi = "10.1093/bioinformatics/btw757",
    language = "English",
    volume = "33",
    pages = "1005--1013",
    journal = "Bioinformatics",
    issn = "1367-4803",
    publisher = "Oxford Univ Press",
    number = "7",

    }

    Statistical inference of protein structural alignments using information and compression. / Collier, James H.; Allison, Lloyd; Lesk, Arthur M.; Stuckey, Peter J.; Garcia De La Banda, Maria; Konagurthu, Arun S.

    In: Bioinformatics, Vol. 33, No. 7, 01.04.2017, p. 1005-1013.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Statistical inference of protein structural alignments using information and compression

    AU - Collier,James H.

    AU - Allison,Lloyd

    AU - Lesk,Arthur M.

    AU - Stuckey,Peter J.

    AU - Garcia De La Banda,Maria

    AU - Konagurthu,Arun S.

    PY - 2017/4/1

    Y1 - 2017/4/1

    N2 - Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.

    AB - Motivation: Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results: We have implemented this approach in MMLigner, the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation: Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner.

    UR - http://www.scopus.com/inward/record.url?scp=85019072762&partnerID=8YFLogxK

    U2 - 10.1093/bioinformatics/btw757

    DO - 10.1093/bioinformatics/btw757

    M3 - Article

    VL - 33

    SP - 1005

    EP - 1013

    JO - Bioinformatics

    T2 - Bioinformatics

    JF - Bioinformatics

    SN - 1367-4803

    IS - 7

    ER -