DNA sequence comparison by a novel probabilistic method

Chenglong Yu, Mo Deng, Stephen S.T. Yau

Research output: Contribution to journalArticleResearchpeer-review

41 Citations (Scopus)


This paper proposes a novel method for comparing DNA sequences. By using a graphical representation, we are able to construct the probability distributions of DNA sequences. These probability distributions can then be used to make similarity studies by using the symmetrised Kullback-Leibler divergence. After presenting our method, we test it using six DNA sequences taken from the threonine operons of Escherichia coli K-12 and Shigella flexneri. Our approach is then used to study the evolution of primates using mitochondrial DNA data. Our method allows us to reconstruct a phylogenetic tree for primate evolution. In addition, we use our technique to analyze the classification and phylogeny of the Tomato Yellow Leaf Curl Virus (TYLCV) based on its whole genome sequences. These examples show that large volumes of DNA sequences can be handled more easily and more quickly by our approach than by the existing multiple alignment methods. Moreover, our method, unlike other approaches, does not require human intervention, because it can be applied automatically.

Original languageEnglish
Pages (from-to)1484-1492
Number of pages9
JournalInformation Sciences
Issue number8
Publication statusPublished - 15 Apr 2011
Externally publishedYes


  • DNA
  • Graphical representation
  • Kullback-Leibler divergence
  • Probability distribution
  • Sequence comparison

Cite this