Analyzing tumor heterogeneity by incorporating long-range mutational influences and multiple sample data into heterogeneity factorial Hidden Markov model

Mohammad S. Rahman, Gholamreza Haffari

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.

Original languageEnglish
Pages (from-to)985-1002
Number of pages18
JournalJournal of Computational Biology
Volume26
Issue number9
DOIs
Publication statusPublished - 5 Sep 2019

Keywords

  • AML
  • BRCA
  • clone
  • heterogeneity
  • long-range mutational influence
  • tumor

Cite this

@article{0e3691ce19e2450fb2b184e0b3099d67,
title = "Analyzing tumor heterogeneity by incorporating long-range mutational influences and multiple sample data into heterogeneity factorial Hidden Markov model",
abstract = "Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.",
keywords = "AML, BRCA, clone, heterogeneity, long-range mutational influence, tumor",
author = "Rahman, {Mohammad S.} and Gholamreza Haffari",
year = "2019",
month = "9",
day = "5",
doi = "10.1089/cmb.2018.0242",
language = "English",
volume = "26",
pages = "985--1002",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc",
number = "9",

}

Analyzing tumor heterogeneity by incorporating long-range mutational influences and multiple sample data into heterogeneity factorial Hidden Markov model. / Rahman, Mohammad S.; Haffari, Gholamreza.

In: Journal of Computational Biology, Vol. 26, No. 9, 05.09.2019, p. 985-1002.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Analyzing tumor heterogeneity by incorporating long-range mutational influences and multiple sample data into heterogeneity factorial Hidden Markov model

AU - Rahman, Mohammad S.

AU - Haffari, Gholamreza

PY - 2019/9/5

Y1 - 2019/9/5

N2 - Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.

AB - Cancer arises from successive rounds of mutations, resulting in tumor cells with different somatic mutations known as clones. Drug responsiveness and therapeutics of cancer depend on the accurate detection of the clones in a tumor sample. Recent research has considered inferring clonal composition of a tumor sample using computational models based on the short read data of the sample generated using the next-generation sequencing (NGS) technology. Short reads (segmented DNA parts of different tumor cells) are noisy; therefore, inferring the clones and their mutations from the data is a difficult and complex problem. Existing methods to infer clones from noisy NGS data do not consider the presence of long-range mutational influences. Therefore, we develop a new model, called extended multiple sample tumor heterogeneity prediction by factorial Hidden Markov model (emHetFHMM), based on factorial hidden Markov models to infer clones and their proportions by capturing the long-range mutational influences. In our model, each hidden chain represents the genomic signature of a clone, and a mixture of chains results in the observed data. We make use of Gibbs sampling and exponentiated gradient algorithms to infer the hidden variables and mixing proportions. We compare our model with strong models from the previous work (PyClone, PhyloSub, and HetFHMM) based on both synthetic data and real cancer data from acute myeloid leukemia. Empirical results confirm that emHetFHMM infers clonal composition of a tumor sample more accurately than previous studies.

KW - AML

KW - BRCA

KW - clone

KW - heterogeneity

KW - long-range mutational influence

KW - tumor

UR - http://www.scopus.com/inward/record.url?scp=85072057814&partnerID=8YFLogxK

U2 - 10.1089/cmb.2018.0242

DO - 10.1089/cmb.2018.0242

M3 - Article

VL - 26

SP - 985

EP - 1002

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 9

ER -