The use of taxon-specific reference databases compromises metagenomic classification

Vanessa R. Marcelino, Edward C. Holmes, Tania C. Sorrell

Research output: Contribution to journalArticleResearchpeer-review

19 Citations (Scopus)

Abstract

A recent article in BMC Genomics describes a new bioinformatics tool, HumanMycobiomeScan, to classify fungal taxa in metagenomic samples. This tool was used to characterize the gut mycobiome of hunter-gatherers and Western populations, resulting in the identification of a range of fungal species in the vast majority of samples. In the HumanMycobiomeScan pipeline, sequence reads are mapped against a reference database containing fungal genome sequences only. We argue that using reference databases comprised of a single taxonomic group leads to an unacceptably high number of false-positives due to: (i) mapping to conserved genetic regions in reference genomes, and (ii) sequence contamination in the assembled reference genomes. To demonstrate this, we replaced the HumanMycobiomeScan's fungal reference database with one containing genome sequences of amphibians and reptiles and re-analysed their case study. The classification pipeline recovered all species present in the reference database, revealing turtles (Geoemydidae), bull frogs (Pyxicephalidae) and snakes (Colubridae) as the most abundant herpetological taxa in the human gut. We also re-analysed their case study using a kingdom-agnostic pipeline. This revealed that while the gut of hunter-gatherers and Western subjects may be colonized by a range of microbial eukaryotes, only three fungal families were retrieved. These results highlight the pitfalls of using taxon-specific reference databases for metagenome classification, even when they are comprised of curated whole genome data. We propose that databases containing all domains of life provide the most suitable option for metagenomic species profiling, especially when targeting microbial eukaryotes.

Original languageEnglish
Article number184
Number of pages5
JournalBMC Genomics
Volume21
Issue number1
DOIs
Publication statusPublished - 27 Feb 2020
Externally publishedYes

Keywords

  • Assembly errors
  • Fungi
  • Metagenomic classifier
  • Microbiome
  • Misclassification
  • Reference database

Cite this