Abstract
Nanopore sequencing signals can be described as indirect noisy observations that reflect the instantaneous conductance of the nanopore channel as an analyte DNA molecule translocates through the pore in real time, with δ nucleotides (δ-mers) blocking the pore at any instant. The sequence of overlapping δ-mers along the ssDNA molecule are thus indirectly observed as a sequence of conductance levels (i.e., a signature) that is used to characterize its DNA sequence. In this paper, we denoise piecewise constant nanopore signals drawn from the same Gaussian-output, left-to-right hidden Markov model (HMM) and recover the unknown signature that is used to parameterize the HMM. We place a Gaussian prior on the signature and use importance sampling to approximate the minimum mean-square error estimate (MMSE) of the signature given the signals. To circumvent the difficulty of sampling from the true posterior, we construct a proposal distribution from which the joint segmentation of the observed signals can be efficiently sampled in O(Mn2k) time, where M is the number of signals, n is the average duration of each signal, and k is the length of the signature. Finally, we evaluate the performance of the algorithm using both simulated and experimental nanopore signals generated by Oxford Nanopore Technologies’ (ONT) R10.4.1 nanopore. The proposed method can be effective in constructing accurate δ-mer tables used to fully characterize all the 4δ states of any nanopore sequencer.
| Original language | English |
|---|---|
| Pages (from-to) | 1993-2007 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Signal Processing |
| Volume | 73 |
| DOIs | |
| Publication status | Published - 15 May 2025 |
Keywords
- hidden Markov models
- importance sampling
- minimum mean-square error estimation
- Nanopore sequencers