TY - JOUR
T1 - Reinspection of a clinical proteomics tumor analysis consortium (Cptac) dataset with cloud computing reveals abundant post-translational modifications and protein sequence variants
AU - Prakash, Amol
AU - Taylor, Lorne
AU - Varkey, Manu
AU - Hoxie, Nate
AU - Mohammed, Yassene
AU - Goo, Young Ah
AU - Peterman, Scott
AU - Moghekar, Abhay
AU - Yuan, Yuting
AU - Glaros, Trevor
AU - Steele, Joel R.
AU - Faridi, Pouya
AU - Parihari, Shashwati
AU - Srivastava, Sanjeeva
AU - Otto, Joseph J.
AU - Nyalwidhe, Julius O.
AU - Semmes, O. John
AU - Moran, Michael F.
AU - Madugundu, Anil
AU - Mun, Dong Gi
AU - Pandey, Akhilesh
AU - Mahoney, Keira E.
AU - Shabanowitz, Jeffrey
AU - Saxena, Satya
AU - Orsburn, Benjamin C.
N1 - Funding Information:
Funding: The application of the Bolt search engine toward the reanalysis presented here was funded by NCI contract 75N91020C00011.
Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/10/2
Y1 - 2021/10/2
N2 - The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databases containing millions of human sequence variants in conjunction with hundreds of common post-translational modifications. Using these tools, we identified tens of thousands of high-confidence PTMs and sequence variants. We identified 4132 phosphorylated peptides in nonenriched samples, 93% of which were confirmed in the samples which were chemically enriched for phosphopeptides. In addition, our results also cover 90% of the high-confidence variants reported by the original proteogenomics study, without the need for sample specific next-generation sequencing. Finally, we report fivefold more somatic and germline variants that have an independent evidence at the peptide level, including mutations in ERRB2 and BCAS1. In this reanalysis of CPTAC proteomic data with cloud computing, we present an openly available and searchable web resource of the highest-coverage proteomic profiling of human tumors described to date.
AB - The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has provided some of the most in-depth analyses of the phenotypes of human tumors ever constructed. Today, the majority of proteomic data analysis is still performed using software housed on desktop computers which limits the number of sequence variants and post-translational modifications that can be considered. The original CPTAC studies limited the search for PTMs to only samples that were chemically enriched for those modified peptides. Similarly, the only sequence variants considered were those with strong evidence at the exon or transcript level. In this multi-institutional collaborative reanalysis, we utilized unbiased protein databases containing millions of human sequence variants in conjunction with hundreds of common post-translational modifications. Using these tools, we identified tens of thousands of high-confidence PTMs and sequence variants. We identified 4132 phosphorylated peptides in nonenriched samples, 93% of which were confirmed in the samples which were chemically enriched for phosphopeptides. In addition, our results also cover 90% of the high-confidence variants reported by the original proteogenomics study, without the need for sample specific next-generation sequencing. Finally, we report fivefold more somatic and germline variants that have an independent evidence at the peptide level, including mutations in ERRB2 and BCAS1. In this reanalysis of CPTAC proteomic data with cloud computing, we present an openly available and searchable web resource of the highest-coverage proteomic profiling of human tumors described to date.
KW - Cancer
KW - Cloud computing
KW - CPTAC
KW - Post-translational modifications
KW - Proteogenomics
KW - Proteomics
KW - Tumor proteomics
UR - http://www.scopus.com/inward/record.url?scp=85116825450&partnerID=8YFLogxK
U2 - 10.3390/cancers13205034
DO - 10.3390/cancers13205034
M3 - Article
C2 - 34680183
AN - SCOPUS:85116825450
SN - 2072-6694
VL - 13
JO - Cancers
JF - Cancers
IS - 20
M1 - 5034
ER -