Robust modelling of solubility in supercritical carbon dioxide using Bayesian methods

Anna Tarasova, Frank Burden, Johann Gasteiger, David A Winkler

Research output: Contribution to journalArticleResearchpeer-review

28 Citations (Scopus)


Two sparse Bayesian methods were used to derive predictive models of solubility of organic dyes and polycyclic aromatic compounds in supercritical carbon dioxide (scCO2), over a wide range of temperatures (285.9-423.2 K) and pressures (60-1400 bar): a multiple linear regression employing an expectation maximization algorithm and a sparse prior (MLREM) method and a non-linear Bayesian Regularized Artificial Neural Network with a Laplacian Prior (BRANNLP). A randomly selected test set was used to estimate the predictive ability of the models. The MLREM method resulted in a model of similar predictivity to the less sparse MLR method, while the non-linear BRANNLP method created models of substantially better predictivity than either the MLREM or MLR based models. The BRANNLP method simultaneously generated context-relevant subsets of descriptors and a robust, non-linear quantitative structure-property relationship (QSPR) model for the compound solubility in scCO2. The differences between linear and non-linear descriptor selection methods are discussed.

Original languageEnglish
Pages (from-to)593-597
Number of pages5
JournalJournal of Molecular Graphics and Modelling
Issue number7
Publication statusPublished - Apr 2010


  • Bayesian methods
  • Dyes
  • Solubility
  • Structure-property relationships
  • Supercritical carbon dioxide

Cite this