TY - JOUR
T1 - Accurate and complete genomes from metagenomes
AU - Chen, Lin Xing
AU - Anantharaman, Karthik
AU - Shaiber, Alon
AU - Murat Eren, A.
AU - Banfield, Jillian F.
N1 - Funding Information:
We thank Brian C. Thomas, Matthew R. Olm, Christopher T. Brown, Alla Lapidus, Tom O. Delmont, and Christian M.K. Sieber for helpful discussions; and Steven Quake and Mark Kowarsky for providing access to unreleased sequences from their cell-free blood study. This work was supported by the Genome Canada Large-Scale Applied Research Program and Ontario Research Fund: Research Excellence grants to Lesley A. Warren; Lawrence Berkeley National Laboratory’s Watershed Function Scientific Focus Area funded by DOE contract DE-AC02-05CH11231; the Office of Science and Office of Biological and Environmental Research (Lawrence Berkeley National Laboratory; Operated by the University of California, Berkeley); and National Institutes of Health (NIH) under awards RAI092531A and R01-GM109454, Chan Zuckerberg Biohub and the UC Berkeley-based Innovative Genomics Institute.
Publisher Copyright:
© 2020 Chen et al. This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
PY - 2020/3
Y1 - 2020/3
N2 - Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
AB - Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
UR - http://www.scopus.com/inward/record.url?scp=85082528739&partnerID=8YFLogxK
U2 - 10.1101/gr.258640.119
DO - 10.1101/gr.258640.119
M3 - Review Article
C2 - 32188701
AN - SCOPUS:85082528739
SN - 1088-9051
VL - 30
SP - 315
EP - 333
JO - Genome Research
JF - Genome Research
IS - 3
ER -