Abstract
Motivation: Cell-type clustering is a crucial first step for single-cell RNA-seq data analysis. However, existing clustering methods often provide different results on cluster assignments with respect to their own data pre-processing, choice of distance metrics, and strategies of feature ex- traction, thereby limiting their practical applications.
Results: We propose Cross-Tabulation Ensemble Clustering (CTEC) method that formulates two re-clustering strategies (distribution- and outlier-based) via cross-tabulation. Benchmarking experiments on five scRNA-Seq datasets illustrate that the proposed CTEC method offers significant improvements over the individual clustering methods. Moreover, CTEC-DB outperforms the state-of-the-art ensemble methods for single-cell data clustering, with 45.4% and 17.1% improvement over the single-cell aggregated from ensemble clustering method (SAFE) and the single-cell aggregated clustering via Mixture model ensemble method (SAME), respectively, on the two-method ensem- ble test.
Availability and implementation: The source code of the benchmark in this work is available at the GitHub repository https://github.com/ LWCHN/CTEC.git.
Results: We propose Cross-Tabulation Ensemble Clustering (CTEC) method that formulates two re-clustering strategies (distribution- and outlier-based) via cross-tabulation. Benchmarking experiments on five scRNA-Seq datasets illustrate that the proposed CTEC method offers significant improvements over the individual clustering methods. Moreover, CTEC-DB outperforms the state-of-the-art ensemble methods for single-cell data clustering, with 45.4% and 17.1% improvement over the single-cell aggregated from ensemble clustering method (SAFE) and the single-cell aggregated clustering via Mixture model ensemble method (SAME), respectively, on the two-method ensem- ble test.
Availability and implementation: The source code of the benchmark in this work is available at the GitHub repository https://github.com/ LWCHN/CTEC.git.
Original language | English |
---|---|
Article number | btae130 |
Number of pages | 11 |
Journal | Bioinformatics |
Volume | 40 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1 Apr 2024 |