This paper demonstrates how Bayesian hierarchical modelling can be used to evaluate the performance of hospitals. We estimate a three-level random intercept probit model to attribute unexplained variation in hospital-acquired complications to hospital effects, hospital-specialty effects and remaining random variations, controlling for observable patient complexities. The combined information provided by the posterior means and densities for latent hospital and specialty effects can be used to assess the need and scope for improvements in patient safety at different organizational levels. Posterior densities are not conventionally presented in performance assessment but provides valuable additional information to policy makers on what poorly performing hospitals and specialties may be prioritized for policy action. We use surgical patient administrative data for 2005/2006 for 16 specialties in 35 public hospitals in Victoria, Australia. We use posterior means for latent hospital and specialty effects to compare hospital performance in patient safety. Posterior densities and variances are also compared for different specialties to identify clinical areas with greatest scope for improvement. We also show that the same hospital may rank markedly differently for different specialties.