Improved similarity scores for comparing motifs

Emi Tanaka, Timothy L. Bailey, Charles E Grant, William Stafford Noble, Uri Keich

Research output: Contribution to journalArticleResearchpeer-review

34 Citations (Scopus)

Abstract

Motivation: A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this problem, Habib et al. suggested a new score [Bayesian Likelihood 2-Component (BLiC)] which uses a Bayesian information criterion to penalize matches that are also similar to the background distribution. Results: We show that the BLiC score exhibits other, highly undesirable properties, and we offer instead a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implement our method in Tomtom and show that, without significantly compromising Tomtom's retrieval accuracy or its runtime, we can drastically reduce the number of uninformative alignments.

Original languageEnglish
Article numberbtr257
Pages (from-to)1603-1609
Number of pages7
JournalBioinformatics
Volume27
Issue number12
DOIs
Publication statusPublished - Jun 2011
Externally publishedYes

Cite this