TY - JOUR
T1 - Complete vertebrate mitogenomes reveal widespread repeats and gene duplications
AU - Formenti, Giulio
AU - Rhie, Arang
AU - Balacco, Jennifer
AU - Haase, Bettina
AU - Mountcastle, Jacquelyn
AU - Fedrigo, Olivier
AU - Brown, Samara
AU - Capodiferro, Marco Rosario
AU - Al-Ajli, Farooq O.
AU - Ambrosini, Roberto
AU - Houde, Peter
AU - Koren, Sergey
AU - Oliver, Karen
AU - Smith, Michelle
AU - Skelton, Jason
AU - Betteridge, Emma
AU - Dolucan, Jale
AU - Corton, Craig
AU - Bista, Iliana
AU - Torrance, James
AU - Tracey, Alan
AU - Wood, Jonathan
AU - Uliano-Silva, Marcela
AU - Howe, Kerstin
AU - McCarthy, Shane
AU - Winkler, Sylke
AU - Kwak, Woori
AU - Korlach, Jonas
AU - Fungtammasan, Arkarachai
AU - Fordham, Daniel
AU - Costa, Vania
AU - Mayes, Simon
AU - Chiara, Matteo
AU - Horner, David S.
AU - Myers, Eugene
AU - Durbin, Richard
AU - Achilli, Alessandro
AU - Braun, Edward L.
AU - Phillippy, Adam M.
AU - Jarvis, Erich D.
AU - Kirschel, Alexander N.G.
AU - Digby, Andrew
AU - Veale, Andrew
AU - Bronikowski, Anne
AU - Murphy, Bob
AU - Robertson, Bruce
AU - Baker, Clare
AU - Mazzoni, Camila
AU - Balakrishnan, Christopher
AU - Lee, Chul
AU - Mead, Daniel
AU - Teeling, Emma
AU - Aiden, Erez Lieberman
AU - Todd, Erica
AU - Eichler, Evan
AU - Naylor, Gavin J.P.
AU - Zhang, Guojie
AU - Smith, Jeramiah
AU - Wolf, Jochen
AU - Touchon, Justin
AU - Delmore, Kira
AU - Jakobsen, Kjetill
AU - Komoroske, Lisa
AU - Wilkinson, Mark
AU - Genner, Martin
AU - Pšenička, Martin
AU - Fuxjager, Matthew
AU - Stratton, Mike
AU - Liedvogel, Miriam
AU - Gemmell, Neil
AU - Minias, Piotr
AU - Dunn, Peter O.
AU - Sudmant, Peter
AU - Morin, Phil
AU - Ayub, Qasim
AU - Kraus, Robert
AU - Vernes, Sonja
AU - Smith, Steve
AU - Lama, Tanya
AU - Edwards, Taylor
AU - Smith, Tim
AU - Gilbert, Tom
AU - Marques-Bonet, Tomas
AU - Einfeldt, Tony
AU - Venkatesh, Byrappa
AU - Johnson, Warren
AU - Warren, Wes
AU - Bukhman, Yury
AU - The Vertebrate Genomes Project Consortium
N1 - Funding Information:
A. R., S. K., and A. M. P. were supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. A. R. was also supported by the Korea Health Technology R&D Project through KHIDI, funded by the Ministry of Health & Welfare, Republic of Korea (HI17C2098). F. O. A. was supported by Al-Gannas Qatari Society and The Cultural Village Foundation-Katara, Doha, State of Qatar and Monash University Malaysia. G. F. and E. D. J were supported by Rockefeller University start-up funds and the Howard Hughes Medical Institute. A.A. and M.R.C. received support from the Fondazione Cariplo project no. 2018–2045 and the Italian Ministry of Education, University and Research (MIUR) for Progetti PRIN2017 20174BTC4R and Dipartimenti di Eccellenza Program (2018–2022). R. D. and S. M. received funding from Wellcome grant WT207492.
Publisher Copyright:
© 2021, The Author(s).
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/4/29
Y1 - 2021/4/29
N2 - Background: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions: Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
AB - Background: Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results: As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions: Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
KW - Assembly
KW - Duplications
KW - Long reads
KW - Mitochondrial DNA
KW - Repeats
KW - Sequencing
KW - Vertebrate
UR - http://www.scopus.com/inward/record.url?scp=85105004099&partnerID=8YFLogxK
U2 - 10.1186/s13059-021-02336-9
DO - 10.1186/s13059-021-02336-9
M3 - Article
C2 - 33910595
AN - SCOPUS:85105004099
SN - 1474-760X
VL - 22
JO - Genome Biology
JF - Genome Biology
IS - 1
M1 - 120
ER -