Projects per year
Abstract
Comparison of protein sequences via alignment is an important routine in modern biological studies. Although the technologies for aligning proteins are mature, the current state of the art continues to be plagued by many shortcomings, chiefly due to the reliance on: (i) naive objective functions, (ii) fixed substitution scores independent of the sequences being considered, (iii) arbitrary choices for gap costs, and (iv) reporting, often, one optimal alignment without a way to recognise other competing sequence alignments. Here, we address these shortcomings by applying the compression-based Minimum Message Length (MML) inference framework to the protein sequence alignment problem. This grounds the problem in statistical learning theory, handles directly the complexity-vs-fit trade-off without ad hoc gap costs, allows unsupervised inference of all the statistical parameters, and permits the visualization and exploration of competing sequence alignment landscape.
Original language | English |
---|---|
Title of host publication | Proceedings |
Subtitle of host publication | DCC 2018 - 2018 Data Compression Conference |
Editors | Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 177-186 |
Number of pages | 10 |
ISBN (Electronic) | 9781538648834 |
DOIs | |
Publication status | Published - 27 Mar 2018 |
Event | Data Compression Conference 2018 - Snowbird, United States of America Duration: 27 Mar 2018 → 30 Mar 2018 Conference number: 28th https://ieeexplore.ieee.org/xpl/conhome/8415925/proceeding (Proceedings) |
Publication series
Name | Data Compression Conference Proceedings |
---|---|
Volume | 2018-March |
ISSN (Print) | 1068-0314 |
Conference
Conference | Data Compression Conference 2018 |
---|---|
Abbreviated title | DCC 2018 |
Country/Territory | United States of America |
City | Snowbird |
Period | 27/03/18 → 30/03/18 |
Internet address |
Keywords
- Bayesian inference
- compression
- minimum message length
- protein sequence alignment
Projects
- 1 Finished
-
Next-generation Protein Structural comparison using Information Theory
Konagurthu, A., Garcia De La Banda Garcia, M., Stuckey, P. & Lesk, A. M.
4/02/15 → 30/09/19
Project: Research