TY - JOUR
T1 - ScPipe
T2 - An extended preprocessing pipeline for comprehensive single-cell ATAC-Seq data integration in R/Bioconductor
AU - Amarasinghe, Shanika L.
AU - Yang, Phil
AU - Voogd, Oliver
AU - Yang, Haoyu
AU - Du, Mei R.M.
AU - Su, Shian
AU - Brown, Daniel V.
AU - Jabbari, Jafar S.
AU - Bowden, Rory
AU - Ritchie, Matthew E.
N1 - Funding Information:
This work was supported by funding from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (Grant No. 2019-002443 to M.E.R.), Australian National Health and Medical Research Council (NHMRC) Investigator Grant (2017257 to M.E.R.), the Australian Research Council (Discovery Project No. 200102903 to M.E.R.), the Genomics Innovation Hub, Victorian State Government Operational Infrastructure Support, Australian Government NHMRC IRIISS and support from the Australian Cancer Research Foundation. The authors are grateful to Dr Saskia Freytag and Mr Reza Ghamsari for providing feedback on this manuscript.
Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - scPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a SingleCellExperiment object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns. Quality control information (e.g. counts per cell, features per cell, total number of fragments, fraction of fragments per peak) and any relevant feature annotations are stored as metadata. We demonstrate that scPipe can efficiently identify 'true' cells and provides flexibility for the user to fine-tune the quality control thresholds using various feature and cell-based metrics collected during data preprocessing. Researchers can then take advantage of various downstream single-cell tools available in Bioconductor for further analysis of scATAC-Seq data such as dimensionality reduction, clustering, motif enrichment, differential accessibility and cis-regulatory network analysis. The scPipe package enables a complete beginning-to-end pipeline for single-cell ATAC-Seq and RNA-Seq data analysis in R.
AB - scPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a SingleCellExperiment object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns. Quality control information (e.g. counts per cell, features per cell, total number of fragments, fraction of fragments per peak) and any relevant feature annotations are stored as metadata. We demonstrate that scPipe can efficiently identify 'true' cells and provides flexibility for the user to fine-tune the quality control thresholds using various feature and cell-based metrics collected during data preprocessing. Researchers can then take advantage of various downstream single-cell tools available in Bioconductor for further analysis of scATAC-Seq data such as dimensionality reduction, clustering, motif enrichment, differential accessibility and cis-regulatory network analysis. The scPipe package enables a complete beginning-to-end pipeline for single-cell ATAC-Seq and RNA-Seq data analysis in R.
UR - https://www.scopus.com/pages/publications/85179497099
U2 - 10.1093/nargab/lqad105
DO - 10.1093/nargab/lqad105
M3 - Article
C2 - 38046273
AN - SCOPUS:85179497099
SN - 2631-9268
VL - 5
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 4
M1 - lqad105
ER -