Performance of neural network basecalling tools for Oxford Nanopore sequencing

Ryan R. Wick, Louise M. Judd, Kathryn E. Holt

Research output: Contribution to journalArticleResearchpeer-review

5 Citations (Scopus)

Abstract

Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.

Original languageEnglish
Article number129
Number of pages10
JournalGenome Biology
Volume20
Issue number1
DOIs
Publication statusPublished - 24 Jun 2019

Keywords

  • Basecalling
  • Long-read sequencing
  • Oxford Nanopore

Cite this

@article{3d35b7bdb23a49deb92d49c8999f4d4a,
title = "Performance of neural network basecalling tools for Oxford Nanopore sequencing",
abstract = "Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.",
keywords = "Basecalling, Long-read sequencing, Oxford Nanopore",
author = "Wick, {Ryan R.} and Judd, {Louise M.} and Holt, {Kathryn E.}",
year = "2019",
month = "6",
day = "24",
doi = "10.1186/s13059-019-1727-y",
language = "English",
volume = "20",
journal = "Genome Biology",
issn = "1474-760X",
publisher = "BioMed Central",
number = "1",

}

Performance of neural network basecalling tools for Oxford Nanopore sequencing. / Wick, Ryan R.; Judd, Louise M.; Holt, Kathryn E.

In: Genome Biology, Vol. 20, No. 1, 129, 24.06.2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Performance of neural network basecalling tools for Oxford Nanopore sequencing

AU - Wick, Ryan R.

AU - Judd, Louise M.

AU - Holt, Kathryn E.

PY - 2019/6/24

Y1 - 2019/6/24

N2 - Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.

AB - Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish. Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy. Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.

KW - Basecalling

KW - Long-read sequencing

KW - Oxford Nanopore

UR - http://www.scopus.com/inward/record.url?scp=85068121711&partnerID=8YFLogxK

U2 - 10.1186/s13059-019-1727-y

DO - 10.1186/s13059-019-1727-y

M3 - Article

VL - 20

JO - Genome Biology

JF - Genome Biology

SN - 1474-760X

IS - 1

M1 - 129

ER -