Variable selection methods for multiple regressions influence the parsimony of risk prediction models for cardiac surgery

Research output: Contribution to journalArticleResearchpeer-review

7 Citations (Scopus)


Objective: To compare the impact of different variable selection methods in multiple regression to develop a parsimonious model for predicting postoperative outcomes of patients undergoing cardiac surgery. Methods: Data from 84,135 patients in the Australian and New Zealand Society of Cardiac and Thoracic Surgeons registry between 2001 and 2014 were analyzed. Primary outcome was 30-day-mortality. Mixed-effect logistic regressions were used to build the model. Missing values were imputed by the use of multiple imputations. The following 5 variable selection methods were compared: bootstrap receiver-operative characteristic (ROC), bootstrap Akaike information criteria, bootstrap Bayesian information criteria, and stepwise forward and stepwise backward methods. The final model's prediction performance was evaluated by the use of Frank Harrell's calibration curve and using a multifold cross-validation approach. Results: Stepwise forward and backward methods selected same set of 21 variables into the model with the area under the ROC (AUC) of 0.8490. The bootstrap ROC method selected 13 variables with AUC of 0.8450. Bootstrap Bayesian information criteria and Akaike information criteria respectively selected 16 (AUC: 0.8470) and 23 (AUC: 0.8491) variables. Bootstrap ROC model was selected as the final model which showed very good discrimination and calibration power. Conclusions: Clinical suitability in terms of parsimony and prediction performance can be achieved substantially by using the bootstrap ROC method for the development of risk prediction models.

Original languageEnglish
Pages (from-to)1128-1135.e3
Number of pages11
JournalJournal of Thoracic and Cardiovascular Surgery
Issue number5
Publication statusPublished - 1 May 2017


  • Automated model
  • Bootstrap resampling
  • Cardiac surgery
  • Risk prediction model

Cite this