Digital expression explorer 2

A repository of uniformly processed RNA sequencing data

Mark Ziemann, Antony Kaspi, Assam El-Osta

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Background: RNA sequencing (RNA-seq) is an indispensable tool in the study of gene regulation. While the technology has brought with it better transcript coverage and quantification, there remain considerable barriers to entry for the computational biologist to analyse large data sets. There is a real need for a repository of uniformly processed RNA-seq data that is easy to use. Findings: To address these obstacles, we developed Digital Expression Explorer 2 (DEE2), a web-based repository of RNA-seq data in the form of gene-level and transcript-level expression counts. DEE2 contains >5.3 trillion assigned reads from 580,000 RNA-seq data sets including species Escherichia coli, yeast, Arabidopsis, worm, fruit fly, zebrafish, rat, mouse, and human. Base-space sequence data downloaded from the National Center for Biotechnology Information Sequence Read Archive underwent quality control prior to transcriptome and genome mapping using open-source tools. Uniform data processing methods ensure consistency across experiments, facilitating fast and reproducible meta-analyses. Conclusions: The web interface allows users to quickly identify data sets of interest using accession number and keyword searches. The data can also be accessed programmatically using a specifically designed R package. We demonstrate that DEE2 data are compatible with statistical packages such as edgeR or DESeq. Bulk data are also available for download. DEE2 can be found at http://dee2.io.

Original languageEnglish
Number of pages13
JournalGigaScience
Volume8
Issue number4
DOIs
Publication statusPublished - 3 Apr 2019

Keywords

  • data reuse
  • gene expression
  • RNA-seq
  • transcriptome

Cite this

@article{29f9716ea0f9417aa052bae1db389fbf,
title = "Digital expression explorer 2: A repository of uniformly processed RNA sequencing data",
abstract = "Background: RNA sequencing (RNA-seq) is an indispensable tool in the study of gene regulation. While the technology has brought with it better transcript coverage and quantification, there remain considerable barriers to entry for the computational biologist to analyse large data sets. There is a real need for a repository of uniformly processed RNA-seq data that is easy to use. Findings: To address these obstacles, we developed Digital Expression Explorer 2 (DEE2), a web-based repository of RNA-seq data in the form of gene-level and transcript-level expression counts. DEE2 contains >5.3 trillion assigned reads from 580,000 RNA-seq data sets including species Escherichia coli, yeast, Arabidopsis, worm, fruit fly, zebrafish, rat, mouse, and human. Base-space sequence data downloaded from the National Center for Biotechnology Information Sequence Read Archive underwent quality control prior to transcriptome and genome mapping using open-source tools. Uniform data processing methods ensure consistency across experiments, facilitating fast and reproducible meta-analyses. Conclusions: The web interface allows users to quickly identify data sets of interest using accession number and keyword searches. The data can also be accessed programmatically using a specifically designed R package. We demonstrate that DEE2 data are compatible with statistical packages such as edgeR or DESeq. Bulk data are also available for download. DEE2 can be found at http://dee2.io.",
keywords = "data reuse, gene expression, RNA-seq, transcriptome",
author = "Mark Ziemann and Antony Kaspi and Assam El-Osta",
year = "2019",
month = "4",
day = "3",
doi = "10.1093/gigascience/giz022",
language = "English",
volume = "8",
journal = "GigaScience",
issn = "2047-217X",
publisher = "Springer-Verlag London Ltd.",
number = "4",

}

Digital expression explorer 2 : A repository of uniformly processed RNA sequencing data. / Ziemann, Mark; Kaspi, Antony; El-Osta, Assam.

In: GigaScience, Vol. 8, No. 4, 03.04.2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Digital expression explorer 2

T2 - A repository of uniformly processed RNA sequencing data

AU - Ziemann, Mark

AU - Kaspi, Antony

AU - El-Osta, Assam

PY - 2019/4/3

Y1 - 2019/4/3

N2 - Background: RNA sequencing (RNA-seq) is an indispensable tool in the study of gene regulation. While the technology has brought with it better transcript coverage and quantification, there remain considerable barriers to entry for the computational biologist to analyse large data sets. There is a real need for a repository of uniformly processed RNA-seq data that is easy to use. Findings: To address these obstacles, we developed Digital Expression Explorer 2 (DEE2), a web-based repository of RNA-seq data in the form of gene-level and transcript-level expression counts. DEE2 contains >5.3 trillion assigned reads from 580,000 RNA-seq data sets including species Escherichia coli, yeast, Arabidopsis, worm, fruit fly, zebrafish, rat, mouse, and human. Base-space sequence data downloaded from the National Center for Biotechnology Information Sequence Read Archive underwent quality control prior to transcriptome and genome mapping using open-source tools. Uniform data processing methods ensure consistency across experiments, facilitating fast and reproducible meta-analyses. Conclusions: The web interface allows users to quickly identify data sets of interest using accession number and keyword searches. The data can also be accessed programmatically using a specifically designed R package. We demonstrate that DEE2 data are compatible with statistical packages such as edgeR or DESeq. Bulk data are also available for download. DEE2 can be found at http://dee2.io.

AB - Background: RNA sequencing (RNA-seq) is an indispensable tool in the study of gene regulation. While the technology has brought with it better transcript coverage and quantification, there remain considerable barriers to entry for the computational biologist to analyse large data sets. There is a real need for a repository of uniformly processed RNA-seq data that is easy to use. Findings: To address these obstacles, we developed Digital Expression Explorer 2 (DEE2), a web-based repository of RNA-seq data in the form of gene-level and transcript-level expression counts. DEE2 contains >5.3 trillion assigned reads from 580,000 RNA-seq data sets including species Escherichia coli, yeast, Arabidopsis, worm, fruit fly, zebrafish, rat, mouse, and human. Base-space sequence data downloaded from the National Center for Biotechnology Information Sequence Read Archive underwent quality control prior to transcriptome and genome mapping using open-source tools. Uniform data processing methods ensure consistency across experiments, facilitating fast and reproducible meta-analyses. Conclusions: The web interface allows users to quickly identify data sets of interest using accession number and keyword searches. The data can also be accessed programmatically using a specifically designed R package. We demonstrate that DEE2 data are compatible with statistical packages such as edgeR or DESeq. Bulk data are also available for download. DEE2 can be found at http://dee2.io.

KW - data reuse

KW - gene expression

KW - RNA-seq

KW - transcriptome

UR - http://www.scopus.com/inward/record.url?scp=85064117016&partnerID=8YFLogxK

U2 - 10.1093/gigascience/giz022

DO - 10.1093/gigascience/giz022

M3 - Article

VL - 8

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 4

ER -