Circular clustering of protein dihedral angles by Minimum Message Length.

D. L. Dowe, L. Allison, T. I. Dix, L. Hunter, C. S. Wallace, T. Edgoose

    Research output: Contribution to journalArticleResearchpeer-review

    15 Citations (Scopus)

    Abstract

    Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States has applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the alpha-Carbon, beta-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program, we are instead able to represent local site properties by the two dihedral angles, phi and psi. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data came. We report on the results of our classification, plotting the classes in (phi, psi) space; and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region phi approximately -1.09 rad and psi approximately -0.75 rad which are close on the spanning tree and have high inter-transition probabilities. This gives rise to a tight, abundant and self-perpetuating structure.

    Original languageEnglish
    Pages (from-to)242-255
    Number of pages14
    JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
    Publication statusPublished - 1996

    Cite this

    @article{d2e26e3dc6754a5ab25726593c9871eb,
    title = "Circular clustering of protein dihedral angles by Minimum Message Length.",
    abstract = "Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States has applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the alpha-Carbon, beta-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program, we are instead able to represent local site properties by the two dihedral angles, phi and psi. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data came. We report on the results of our classification, plotting the classes in (phi, psi) space; and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region phi approximately -1.09 rad and psi approximately -0.75 rad which are close on the spanning tree and have high inter-transition probabilities. This gives rise to a tight, abundant and self-perpetuating structure.",
    author = "Dowe, {D. L.} and L. Allison and Dix, {T. I.} and L. Hunter and Wallace, {C. S.} and T. Edgoose",
    year = "1996",
    language = "English",
    pages = "242--255",
    journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",

    }

    Circular clustering of protein dihedral angles by Minimum Message Length. / Dowe, D. L.; Allison, L.; Dix, T. I.; Hunter, L.; Wallace, C. S.; Edgoose, T.

    In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 1996, p. 242-255.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Circular clustering of protein dihedral angles by Minimum Message Length.

    AU - Dowe, D. L.

    AU - Allison, L.

    AU - Dix, T. I.

    AU - Hunter, L.

    AU - Wallace, C. S.

    AU - Edgoose, T.

    PY - 1996

    Y1 - 1996

    N2 - Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States has applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the alpha-Carbon, beta-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program, we are instead able to represent local site properties by the two dihedral angles, phi and psi. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data came. We report on the results of our classification, plotting the classes in (phi, psi) space; and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region phi approximately -1.09 rad and psi approximately -0.75 rad which are close on the spanning tree and have high inter-transition probabilities. This gives rise to a tight, abundant and self-perpetuating structure.

    AB - Early work on proteins identified the existence of helices and extended sheets in protein secondary structures, a high-level classification which remains popular today. Using the Snob program for information-theoretic Minimum Message Length (MML) classification, we are able to take the protein dihedral angles as determined by X-ray crystallography, and cluster sets of dihedral angles into groups. Previous work by Hunter and States has applied a similar Bayesian classification method, AutoClass, to protein data with site position represented by 3 Cartesian co-ordinates for each of the alpha-Carbon, beta-Carbon and Nitrogen, totalling 9 co-ordinates. By using the von Mises circular distribution in the Snob program, we are instead able to represent local site properties by the two dihedral angles, phi and psi. Since each site can be modelled as having 2 degrees of freedom, this orientation-invariant dihedral angle representation of the data is more compact than that of nine highly-correlated Cartesian co-ordinates. Using the information-theoretic message length concepts discussed in the paper, such a more concise model is more likely to represent the underlying generating process from which the data came. We report on the results of our classification, plotting the classes in (phi, psi) space; and introducing a symmetric information-theoretic distance measure to build a minimum spanning tree between the classes. We also give a transition matrix between the classes and note the existence of three classes in the region phi approximately -1.09 rad and psi approximately -0.75 rad which are close on the spanning tree and have high inter-transition probabilities. This gives rise to a tight, abundant and self-perpetuating structure.

    UR - http://www.scopus.com/inward/record.url?scp=0030309796&partnerID=8YFLogxK

    M3 - Article

    SP - 242

    EP - 255

    JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

    JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

    ER -