Nanopore-only assemblies for genomic surveillance of the global priority drug-resistant pathogen, Klebsiella pneumoniae

Ebenezer Foster-Nyarko, Hugh Cottingham, Ryan R. Wick, Louise M. Judd, Margaret M.C. Lam, Kelly L. Wyres, Thomas D. Stanton, Kara K. Tsang, Sophia David, David M. Aanensen, Sylvain Brisse, Kathryn E. Holt

Research output: Contribution to journalArticleResearchpeer-review

19 Citations (Scopus)

Abstract

Oxford Nanopore Technologies (ONT) sequencing has rich potential for genomic epidemiology and public health investigations of bacterial pathogens, particularly in low-resource settings and at the point of care, due to its portability and affordability. However, low base-call accuracy has limited the reliability of ONT data for critical tasks such as antimicrobial resistance (AMR) and virulence gene detection and typing, serotype prediction, and cluster identification. Thus, Illumina sequencing remains the standard for genomic surveillance despite higher capital and running costs. We tested the accuracy of ONT-only assemblies for common applied bacterial genomics tasks (genotyping and cluster detection, implemented via Kleborate, Kaptive and Patho-genwatch), using data from 54 unique Klebsiella pneumoniae isolates. ONT reads generated via MinION with R9.4.1 flowcells were basecalled using three alternative models [Fast, High-accuracy (HAC) and Super-accuracy (SUP), available within ONT’s Guppy software], assembled with Flye and polished using Medaka. Accuracy of typing using ONT-only assemblies was com-pared with that of Illumina-only and hybrid ONT+Illumina assemblies, constructed from the same isolates as reference stand-ards. The most resource-intensive ONT-assembly approach (SUP basecalling, with or without Medaka polishing) performed best, yielding reliable capsule (K) type calls for all strains (100 % exact or best matching locus), reliable multi-locus sequence type (MLST) assignment (98.3 % exact match or single-locus variants), and good detection of acquired AMR genes and mutations (88–100 % correct identification across the various drug classes). Distance-based trees generated from SUP+Medaka assemblies accurately reflected overall genetic relationships between isolates. The definition of outbreak clusters from ONT-only assemblies was problematic due to inflation of SNP counts by high base-call errors. However, ONT data could be reliably used to ‘rule out’ isolates of distinct lineages from suspected transmission clusters. HAC basecalling + Medaka polishing performed similarly to SUP basecalling without polishing. Therefore, we recommend investing compute resources into basecalling (SUP model), wherever compute resources and time allow, and note that polishing is also worthwhile for improved performance. Overall, our results show that MLST, K type and AMR determinants can be reliably identified with ONT-only R9.4.1 flowcell data. However, cluster detection remains challenging with this technology.

Original languageEnglish
Article number000936
Number of pages16
JournalMicrobial Genomics
Volume9
Issue number2
DOIs
Publication statusPublished - 2023

Keywords

  • AMR
  • bacterial pathogens
  • basecalling
  • benchmarking
  • genomic surveillance
  • Klebsiella pneumoniae
  • MLST
  • Nanopore sequencing
  • phylogenetic clustering
  • serotyping

Cite this