Four hundred or more participants needed for stable contingency table estimates of clinical prediction rule performance

Peter Kent, Eleanor Boyle, Jennifer L. Keating, Hanne B Albert, Jan Hartvigsen

Research output: Contribution to journalArticleResearchpeer-review

4 Citations (Scopus)


Objectives To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. Study Design and Setting An analysis of three pre-existing sets of large cohort data (n = 4,062–8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. Results There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400–600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. Conclusion To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance.

Original languageEnglish
Pages (from-to)137-148
Number of pages12
JournalJournal of Clinical Epidemiology
Publication statusPublished - 1 Feb 2017


  • Clinical prediction rule
  • Decision support techniques
  • Epidemiologic research design
  • Predictive value of tests
  • Reproducibility of results
  • Sample size

Cite this