Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks

Yan Zhu, Fuyi Li, Dongxu Xiang, Tatsuya Akutsu, Jiangning Song, Cangzhi Jia

Research output: Contribution to journalArticleResearchpeer-review

Abstract

A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter-erc-monash-edu.ezproxy.lib.monash.edu.au/.
Original languageEnglish
Article numberbbaa299
Number of pages11
JournalBriefings in Bioinformatics
DOIs
Publication statusAccepted/In press - 24 Nov 2020

Keywords

  • eukaryotic promoters
  • bioinformatics
  • sequence analysis
  • machine learning
  • deep learning

Cite this