Musket: A multistage k-mer spectrum-based error corrector for Illumina sequence data

Yongchao Liu, Jan Schröder, Bertil Schmidt

Research output: Contribution to journalArticleResearchpeer-review

214 Citations (Scopus)

Abstract

The imperfect sequence data produced by nextgeneration sequencing technologies have motivated the development of a number of short-read error correctors in recent years. The majority of methods focus on the correction of substitution errors, which are the dominant error source in data produced by Illumina sequencing technology. Existing tools either score high in terms of recall or precision but not consistently high in terms of both measures. Results: In this article, we present Musket, an efficient multistage k-mer-based corrector for Illumina short-read data. We use the k-mer spectrum approach and introduce three correction techniques in a multistage workflow: two-sided conservative correction, onesided aggressive correction and voting-based refinement. Our performance evaluation results, in terms of correction quality and de novo genome assembly measures, reveal that Musket is consistently one of the top performing correctors. In addition, Musket is multithreaded using a master-slave model and demonstrates superior parallel scalability compared with all other evaluated correctors as well as a highly competitive overall execution time.

Original languageEnglish
Pages (from-to)308-315
Number of pages8
JournalBioinformatics
Volume29
Issue number3
DOIs
Publication statusPublished - 1 Feb 2013
Externally publishedYes

Cite this