Background Preoperative risk prediction is important for guiding clinical decision-making and resource allocation. Clinicians frequently rely solely on their own clinical judgement for risk prediction rather than objective measures. We aimed to compare the accuracy of freely available objective surgical risk tools with subjective clinical assessment in predicting 30-day mortality. Methods and findings We conducted a prospective observational study in 274 hospitals in the United Kingdom (UK), Australia, and New Zealand. For 1 week in 2017, prospective risk, surgical, and outcome data were collected on all adults aged 18 years and over undergoing surgery requiring at least a 1-night stay in hospital. Recruitment bias was avoided through an ethical waiver to patient consent; a mixture of rural, urban, district, and university hospitals participated. We compared subjective assessment with 3 previously published, open-access objective risk tools for predicting 30-day mortality: the Portsmouth-Physiology and Operative Severity Score for the enUmeration of Mortality (P-POSSUM), Surgical Risk Scale (SRS), and Surgical Outcome Risk Tool (SORT). We then developed a logistic regression model combining subjective assessment and the best objective tool and compared its performance to each constituent method alone. We included 22,631 patients in the study: 52.8% were female, median age was 62 years (interquartile range [IQR] 46 to 73 years), median postoperative length of stay was 3 days (IQR 1 to 6), and inpatient 30-day mortality was 1.4%. Clinicians used subjective assessment alone in 88.7% of cases. All methods overpredicted risk, but visual inspection of plots showed the SORT to have the best calibration. The SORT demonstrated the best discrimination of the objective tools (SORT Area Under Receiver Operating Characteristic curve [AUROC] = 0.90, 95% confidence interval [CI]: 0.88–0.92; P-POSSUM = 0.89, 95% CI 0.88–0.91; SRS = 0.85, 95% CI 0.82–0.87). Subjective assessment demonstrated good discrimination (AUROC = 0.89, 95% CI: 0.86–0.91) that was not different from the SORT (p = 0.309). Combining subjective assessment and the SORT improved discrimination (bootstrap optimism-corrected AUROC = 0.92, 95% CI: 0.90–0.94) and demonstrated continuous Net Reclassification Improvement (NRI = 0.13, 95% CI: 0.06–0.20, p < 0.001) compared with subjective assessment alone. Decision-curve analysis (DCA) confirmed the superiority of the SORT over other previously published models, and the SORT–clinical judgement model again performed best overall. Our study is limited by the low mortality rate, by the lack of blinding in the ‘subjective’ risk assessments, and because we only compared the performance of clinical risk scores as opposed to other prediction tools such as exercise testing or frailty assessment. Conclusions In this study, we observed that the combination of subjective assessment with a parsimonious risk model improved perioperative risk estimation. This may be of value in helping clinicians allocate finite resources such as critical care and to support patient involvement in clinical decision-making.