Stacking and chaining of normalization methods in deep learning-based classification of colorectal cancer using gut microbiome data

Mwenge Mulenga, Sameem Abdul Kareem, Aznul Qalid Md Sabri, Manjeevan Seera

Research output: Contribution to journalArticleResearchpeer-review

10 Citations (Scopus)

Abstract

Machine learning (ML)-based detection of diseases using sequence-based gut microbiome data has been of great interest within the artificial intelligence in medicine (AIM) community. The approach offers a non-invasive alternative for colorectal cancer detection, which is based on stool samples. Considering limitations of existing methods in CRC detection, medical research has shown interest in the use of high throughput data to identify the disease. Owing to several limitations of conventional ML algorithms, deep learning (DL) methods are becoming more popular due to their outstanding performance in related fields. However, the performance of DL methods is affected by limitations such as dimensionality, sparsity, and feature dominance inherent in microbiome data. This research proposes stacking and chaining of normalization methods to address the limitations. While the stacking technique offers a robust, easy to use, and interpretable alternative for augmenting microbiome and other tabular data, the chaining technique is an alternative to data normalization that dynamically adjusts the underlying properties of data towards the normal distribution. The proposed techniques are combined with rank transformation and feature selection to further improve the performance of the model, with area under the curve (AUC) values between 0.857 to 0.987 using publicly available datasets.

Original languageEnglish
Pages (from-to)97296-97319
Number of pages24
JournalIEEE Access
Volume9
DOIs
Publication statusPublished - 2021

Keywords

  • augmentation
  • chaining
  • colorectal cancer
  • Deep neural network
  • microbiome
  • normalization
  • stacking

Cite this