On identifying statistical redundancy at the level of amino acid subsequences

Sandun Rajapaksa, DInithi Sumanaweera, Maria Garcia De La Banda, Peter Stuckey, David Abramson, Lloyd Allison, Arthur Lesk, Arun Konagurthu

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


This paper presents a framework to characterize and identify local sequences of proteins that are statistically redundant under the measure of Shannon information content while accounting for variations in their occurrences over evolutionary insertions, deletions, and substitutions of amino acids. The identification of such local sequences provides insights for downstream studies on proteins. Here, we have applied our methods to amino acid sequence data sets derived from a database corresponding to 935,552 substructural regions of varying sizes, covering 113,724 proteins from the protein data bank. The results identify, among others, a surjective mapping between 110,598 local sequences (with an average length of 82 amino acids per sequence) and 1,493 topological shapes. The C++ source code and supporting material are available from https://lcb.infotech.monash.edu.au/bibm2021.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine
EditorsYufei Huang, Lukasz Kurgan, Feng Luo, Xiaohua Tony Hu, Yidong Chen, Edward Dougherty, Andrzej Kloczkowski, Yaohang Li
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)9781665401265
ISBN (Print)9781665429825
Publication statusPublished - 2021
EventIEEE International Conference on Bioinformatics and Biomedicine 2021 - Virtual, Online, United States of America
Duration: 9 Dec 202112 Dec 2021
https://ieeexplore.ieee.org/xpl/conhome/9669261/proceeding (Proceedings)

Publication series

NameProceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021


ConferenceIEEE International Conference on Bioinformatics and Biomedicine 2021
Abbreviated titleBIBM 2021
Country/TerritoryUnited States of America
CityVirtual, Online
Internet address

Cite this