Projects per year
Abstract
Computational analyses of the growing corpus of three-dimensional (3D) structures of proteins have revealed a limited set of recurrent substructural themes, termed super-secondary structures. Knowledge of super-secondary structures is important for the study of protein evolution and for the modeling of proteins with unknown structures. Characterizing a comprehensive dictionary of these super-secondary structures has been an unanswered computational challenge in protein structural studies. This paper presents an unsupervised method for learning such a comprehensive dictionary using the statistical framework of lossless compression on a database comprised of concise geometric representations of protein 3D folding patterns. The best dictionary is defined as the one that yields the most compression of the database. Here we describe the inference methodology and the statistical models used to estimate the encoding lengths. An interactive website for this dictionary is available at http://lcb.infotech.monash.edu.au/proteinConcepts/scop100/dictionary.HTML.
Original language | English |
---|---|
Title of host publication | Proceedings - DCC 2017, 2017 Data Compression Conference |
Subtitle of host publication | 4 - 7 April 2017, Snowbird, Utah, USA |
Editors | Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer |
Place of Publication | Piscataway, NJ |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 340-349 |
Number of pages | 10 |
ISBN (Electronic) | 9781509067213 |
ISBN (Print) | 9781509067220 |
DOIs | |
Publication status | Published - 8 May 2017 |
Event | Data Compression Conference 2017 - Snowbird, United States of America Duration: 4 Apr 2017 → 7 Apr 2017 Conference number: 27th https://ieeexplore.ieee.org/xpl/conhome/7921793/proceeding (IEEE Conference Proceedings) |
Publication series
Name | Data Compression Conference. Proceedings |
---|---|
Publisher | I E E E Computer Society |
ISSN (Print) | 1068-0314 |
Conference
Conference | Data Compression Conference 2017 |
---|---|
Abbreviated title | DCC 2017 |
Country/Territory | United States of America |
City | Snowbird |
Period | 4/04/17 → 7/04/17 |
Internet address |
|
Keywords
- Minimum Message Length
- MML
- Protein structure
- super-secondary structural patterns
Projects
- 1 Finished
-
Next-generation Protein Structural comparison using Information Theory
Konagurthu, A., Garcia De La Banda Garcia, M., Stuckey, P. & Lesk, A. M.
4/02/15 → 30/09/19
Project: Research