Abstract
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.
Original language | English |
---|---|
Pages (from-to) | 271-281.e7 |
Number of pages | 19 |
Journal | Cell Systems |
Volume | 6 |
Issue number | 3 |
DOIs | |
Publication status | Published - 28 Mar 2018 |
Externally published | Yes |
Keywords
- large-scale
- open science
- pan-cancer
- PanCanAtlas project
- reproducible computing
- somatic mutation calling
- TCGA
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: Cell Systems, Vol. 6, No. 3, 28.03.2018, p. 271-281.e7.
Research output: Contribution to journal › Article › Research › peer-review
TY - JOUR
T1 - Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines
AU - Ellrott, Kyle
AU - Bailey, Matthew H.
AU - Saksena, Gordon
AU - Covington, Kyle R.
AU - Kandoth, Cyriac
AU - Stewart, Chip
AU - Hess, Julian
AU - Ma, Singer
AU - Chiotti, Kami E.
AU - McLellan, Michael D.
AU - Sofia, Heidi J.
AU - Hutter, Carolyn M.
AU - Getz, Gad
AU - Wheeler, David A.
AU - Ding, Li
AU - Caesar-Johnson, Samantha J.
AU - Demchok, John A.
AU - Felau, Ina
AU - Kasapi, Melpomeni
AU - Ferguson, Martin L.
AU - Tarnuzzer, Roy
AU - Wang, Zhining
AU - Yang, Liming
AU - Zenklusen, Jean C.
AU - Zhang, Jiashan (Julia)
AU - Chudamani, Sudha
AU - Liu, Jia
AU - Lolla, Laxmi
AU - Naresh, Rashi
AU - Pihl, Todd
AU - Sun, Qiang
AU - Wan, Yunhu
AU - Wu, Ye
AU - Cho, Juok
AU - DeFreitas, Timothy
AU - Frazer, Scott
AU - Gehlenborg, Nils
AU - Heiman, David I.
AU - Kim, Jaegil
AU - Lawrence, Michael S.
AU - Lin, Pei
AU - Meier, Sam
AU - Noble, Michael S.
AU - Voet, Doug
AU - Zhang, Hailei
AU - Bernard, Brady
AU - Chambwe, Nyasha
AU - Dhankani, Varsha
AU - Knijnenburg, Theo
AU - Kramer, Roger
AU - Leinonen, Kalle
AU - Liu, Yuexin
AU - Miller, Michael
AU - Reynolds, Sheila
AU - Shmulevich, Ilya
AU - Thorsson, Vesteinn
AU - Zhang, Wei
AU - Akbani, Rehan
AU - Broom, Bradley M.
AU - Hegde, Apurva M.
AU - Ju, Zhenlin
AU - Kanchi, Rupa S.
AU - Korkut, Anil
AU - Li, Jun
AU - Liang, Han
AU - Ling, Shiyun
AU - Liu, Wenbin
AU - Lu, Yiling
AU - Mills, Gordon B.
AU - Ng, Kwok Shing
AU - Rao, Arvind
AU - Ryan, Michael
AU - Wang, Jing
AU - Weinstein, John N.
AU - Zhang, Jiexin
AU - Abeshouse, Adam
AU - Armenia, Joshua
AU - Chakravarty, Debyani
AU - Chatila, Walid K.
AU - de Bruijn, Ino
AU - Gao, Jianjiong
AU - Gross, Benjamin E.
AU - Heins, Zachary J.
AU - Kundra, Ritika
AU - La, Konnor
AU - Ladanyi, Marc
AU - Luna, Augustin
AU - Nissan, Moriah G.
AU - Ochoa, Angelica
AU - Phillips, Sarah M.
AU - Reznik, Ed
AU - Sanchez-Vega, Francisco
AU - Sander, Chris
AU - Schultz, Nikolaus
AU - Sheridan, Robert
AU - Sumer, S. Onur
AU - Sun, Yichao
AU - Taylor, Barry S.
AU - Wang, Jioajiao
AU - Zhang, Hongxin
AU - Anur, Pavana
AU - Peto, Myron
AU - Spellman, Paul
AU - Benz, Christopher
AU - Stuart, Joshua M.
AU - Wong, Christopher K.
AU - Yau, Christina
AU - Hayes, D. Neil
AU - Parker, null
AU - Ally, Adrian
AU - Balasundaram, Miruna
AU - Bowlby, Reanne
AU - Brooks, Denise
AU - Carlsen, Rebecca
AU - Chuah, Eric
AU - Dhalla, Noreen
AU - Holt, Robert
AU - Jones, Steven J.M.
AU - Kasaian, Katayoon
AU - Lee, Darlene
AU - Ma, Yussanne
AU - Marra, Marco A.
AU - Mayo, Michael
AU - Moore, Richard A.
AU - Mungall, Andrew J.
AU - Mungall, Karen
AU - Robertson, A. Gordon
AU - Sadeghi, Sara
AU - Schein, Jacqueline E.
AU - Sipahimalani, Payal
AU - Tam, Angela
AU - Thiessen, Nina
AU - Tse, Kane
AU - Wong, Tina
AU - Berger, Ashton C.
AU - Beroukhim, Rameen
AU - Cherniack, Andrew D.
AU - Cibulskis, Carrie
AU - Gabriel, Stacey B.
AU - Gao, Galen F.
AU - Ha, Gavin
AU - Meyerson, Matthew
AU - Schumacher, Steven E.
AU - Shih, Juliann
AU - Kucherlapati, Melanie H.
AU - Kucherlapati, Raju S.
AU - Baylin, Stephen
AU - Cope, Leslie
AU - Danilova, Ludmila
AU - Bootwalla, Moiz S.
AU - Lai, Phillip H.
AU - Maglinte, Dennis T.
AU - Van Den Berg, David J.
AU - Weisenberger, Daniel J.
AU - Auman, J. Todd
AU - Balu, Saianand
AU - Bodenheimer, Tom
AU - Fan, Cheng
AU - Hoadley, Katherine A.
AU - Hoyle, Alan P.
AU - Jefferys, Stuart R.
AU - Jones, Corbin D.
AU - Meng, Shaowu
AU - Mieczkowski, Piotr A.
AU - Mose, Lisle E.
AU - Perou, Amy H.
AU - Perou, Charles M.
AU - Roach, Jeffrey
AU - Shi, Yan
AU - Simons, Janae V.
AU - Skelly, Tara
AU - Soloway, Matthew G.
AU - Tan, Donghui
AU - Veluvolu, Umadevi
AU - Fan, Huihui
AU - Hinoue, Toshinori
AU - Laird, Peter W.
AU - Shen, Hui
AU - Zhou, Wanding
AU - Bellair, Michelle
AU - Chang, Kyle
AU - Creighton, Chad J.
AU - Dinh, Huyen
AU - Doddapaneni, Harsha Vardhan
AU - Donehower, Lawrence A.
AU - Drummond, Jennifer
AU - Gibbs, Richard A.
AU - Glenn, Robert
AU - Hale, Walker
AU - Han, Yi
AU - Hu, Jianhong
AU - Korchina, Viktoriya
AU - Lee, Sandra
AU - Lewis, Lora
AU - Li, Wei
AU - Liu, Xiuping
AU - Morgan, Margaret
AU - Morton, Donna
AU - Muzny, Donna
AU - Santibanez, Jireh
AU - Sheth, Margi
AU - Shinbrot, Eve
AU - Wang, Linghua
AU - Wang, Min
AU - Xi, Liu
AU - Zhao, Fengmei
AU - Appelbaum, Elizabeth L.
AU - Cordes, Matthew G.
AU - Fronick, Catrina C.
AU - Fulton, Lucinda A.
AU - Fulton, Robert S.
AU - Mardis, Elaine R.
AU - Miller, Christopher A.
AU - Schmidt, Heather K.
AU - Wilson, Richard K.
AU - Crain, Daniel
AU - Curley, Erin
AU - Gardner, Johanna
AU - Lau, Kevin
AU - Mallery, David
AU - Morris, Scott
AU - Paulauskis, Joseph
AU - Penny, Robert
AU - Shelton, Candace
AU - Shelton, Troy
AU - Sherman, Mark
AU - Thompson, Eric
AU - Yena, Peggy
AU - Bowen, Jay
AU - Gastier-Foster, Julie M.
AU - Gerken, Mark
AU - Leraas, Kristen M.
AU - Lichtenberg, Tara M.
AU - Ramirez, Nilsa C.
AU - Wise, Lisa
AU - Zmuda, Erik
AU - Corcoran, Niall
AU - Costello, Tony
AU - Hovens, Christopher
AU - Carvalho, Andre L.
AU - de Carvalho, Ana C.
AU - Fregnani, José H.
AU - Longatto-Filho, Adhemar
AU - Reis, Rui M.
AU - Scapulatempo-Neto, Cristovam
AU - Silveira, Henrique C.S.
AU - Vidal, Daniel O.
AU - Burnette, Andrew
AU - Eschbacher, Jennifer
AU - Boussioutas, Alex
AU - the MC3 Working Group
AU - The Cancer Genome Atlas Research Network
N1 - Publisher Copyright: © 2018 The Authors
PY - 2018/3/28
Y1 - 2018/3/28
N2 - The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.
AB - The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects. The MC3 is a variant calling project of over 10,000 cancer exome samples from 33 cancer types. Over three million somatic variants were detected using seven different methods developed from institutions across the United States. These variants formed the basis for the PanCan Atlas papers.
KW - large-scale
KW - open science
KW - pan-cancer
KW - PanCanAtlas project
KW - reproducible computing
KW - somatic mutation calling
KW - TCGA
UR - http://www.scopus.com/inward/record.url?scp=85044569292&partnerID=8YFLogxK
U2 - 10.1016/j.cels.2018.03.002
DO - 10.1016/j.cels.2018.03.002
M3 - Article
C2 - 29596782
AN - SCOPUS:85044569292
SN - 2405-4712
VL - 6
SP - 271-281.e7
JO - Cell Systems
JF - Cell Systems
IS - 3
ER -