Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study

Lei Sun, Zhihua Zhang, Timothy L. Bailey, Andrew C. Perkins, Michael R. Tallack, Zhao-Xu Xu, Hui Liu

Research output: Contribution to journalArticleResearchpeer-review

106 Citations (Scopus)

Abstract

Background: Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.Results: We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.Conclusions: Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.

Original languageEnglish
Article number331
JournalBMC Bioinformatics
Volume13
Issue number1
DOIs
Publication statusPublished - 13 Dec 2012
Externally publishedYes

Cite this