TY - JOUR
T1 - Four hundred or more participants needed for stable contingency table estimates of clinical prediction rule performance
AU - Kent, Peter
AU - Boyle, Eleanor
AU - Keating, Jennifer L.
AU - Albert, Hanne B
AU - Hartvigsen, Jan
PY - 2017/2/1
Y1 - 2017/2/1
N2 - Objectives To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. Study Design and Setting An analysis of three pre-existing sets of large cohort data (n = 4,062–8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. Results There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400–600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. Conclusion To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance.
AB - Objectives To quantify variability in the results of statistical analyses based on contingency tables and discuss the implications for the choice of sample size for studies that derive clinical prediction rules. Study Design and Setting An analysis of three pre-existing sets of large cohort data (n = 4,062–8,674) was performed. In each data set, repeated random sampling of various sample sizes, from n = 100 up to n = 2,000, was performed 100 times at each sample size and the variability in estimates of sensitivity, specificity, positive and negative likelihood ratios, posttest probabilities, odds ratios, and risk/prevalence ratios for each sample size was calculated. Results There were very wide, and statistically significant, differences in estimates derived from contingency tables from the same data set when calculated in sample sizes below 400 people, and typically, this variability stabilized in samples of 400–600 people. Although estimates of prevalence also varied significantly in samples below 600 people, that relationship only explains a small component of the variability in these statistical parameters. Conclusion To reduce sample-specific variability, contingency tables should consist of 400 participants or more when used to derive clinical prediction rules or test their performance.
KW - Clinical prediction rule
KW - Decision support techniques
KW - Epidemiologic research design
KW - Predictive value of tests
KW - Reproducibility of results
KW - Sample size
UR - http://www.scopus.com/inward/record.url?scp=85009834147&partnerID=8YFLogxK
U2 - 10.1016/j.jclinepi.2016.10.004
DO - 10.1016/j.jclinepi.2016.10.004
M3 - Article
AN - SCOPUS:85009834147
SN - 0895-4356
VL - 82
SP - 137
EP - 148
JO - Journal of Clinical Epidemiology
JF - Journal of Clinical Epidemiology
ER -