An automated approach for global identification of sRNA-encoding regions in RNA-Seq data from Mycobacterium tuberculosis

Ming Wang, Joy Fleming, Zihui Li, Chuanyou Li, Hongtai Zhang, Yunxin Xue, Maoshan Chen, Zongde Zhang, Xian En Zhang, Lijun Bi

Research output: Contribution to journalArticleResearchpeer-review

14 Citations (Scopus)


Deep-sequencing of bacterial transcriptomes using RNA-Seq technology has made it possible to identify small non-coding RNAs, RNA molecules which regulate gene expression in response to changing environments, on a genome-wide scale in an ever-increasing range of prokaryotes. However, a simple and reliable automated method for identifying sRNA candidates in these large datasets is lacking. Here, after generating a transcriptome from an exponential phase culture of Mycobacterium tuberculosis H37Rv, we developed and validated an automated method for the genome-wide identification of sRNA candidate-containing regions within RNA-Seq datasets based on the analysis of the characteristics of reads coverage maps. We identified 192 novel candidate sRNA-encoding regions in intergenic regions and 664 RNA transcripts transcribed from regions antisense (as) to open reading frames (ORF), which bear the characteristics of asRNAs, and validated 28 of these novel sRNA-encoding regions by northern blotting. Our work has not only provided a simple automated method for genome-wide identification of candidate sRNA-encoding regions in RNA-Seq data, but has also uncovered many novel candidate sRNA-encoding regions in M. tuberculosis, reinforcing the view that the control of gene expression in bacteria is more complex than previously anticipated.

Original languageEnglish
Pages (from-to)544-553
Number of pages10
JournalActa Biochimica et Biophysica Sinica
Issue number6
Publication statusPublished - 1 Jan 2016
Externally publishedYes


  • Mycobacterium tuberculosis
  • Non-coding RNA
  • RNA-Seq
  • Transcriptome

Cite this