The bits between proteins

Dinithi Sumanaweera, Lloyd Allison, Arun S. Konagurthu

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)


Comparison of protein sequences via alignment is an important routine in modern biological studies. Although the technologies for aligning proteins are mature, the current state of the art continues to be plagued by many shortcomings, chiefly due to the reliance on: (i) naive objective functions, (ii) fixed substitution scores independent of the sequences being considered, (iii) arbitrary choices for gap costs, and (iv) reporting, often, one optimal alignment without a way to recognise other competing sequence alignments. Here, we address these shortcomings by applying the compression-based Minimum Message Length (MML) inference framework to the protein sequence alignment problem. This grounds the problem in statistical learning theory, handles directly the complexity-vs-fit trade-off without ad hoc gap costs, allows unsupervised inference of all the statistical parameters, and permits the visualization and exploration of competing sequence alignment landscape.

Original languageEnglish
Title of host publicationProceedings
Subtitle of host publicationDCC 2018 - 2018 Data Compression Conference
EditorsAli Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages10
ISBN (Electronic)9781538648834
Publication statusPublished - 27 Mar 2018
EventData Compression Conference 2018 - Snowbird, United States of America
Duration: 27 Mar 201830 Mar 2018
Conference number: 28th (Proceedings)

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314


ConferenceData Compression Conference 2018
Abbreviated titleDCC 2018
Country/TerritoryUnited States of America
Internet address


  • Bayesian inference
  • compression
  • minimum message length
  • protein sequence alignment

Cite this