Recent advances in social science surveys include collection of biological samples. Although biomarkers offer a large potential for social science and economic research, they impose a number of statistical challenges, often being distributed asymmetrically with heavy tails. Using data from the UK Household Panel Survey, we illustrate the comparative performance of a set of flexible parametric distributions, which allow for a wide range of skewness and kurtosis: the four-parameter generalized beta of the second kind (GB2), the three-parameter generalized gamma, and their three-, two-, or one-parameter nested and limiting cases. Commonly used blood-based biomarkers for inflammation, diabetes, cholesterol, and stress-related hormones are modelled. Although some of the three-parameter distributions nested within the GB2 outperform the latter for most of the biomarkers considered, the GB2 can be used as a guide for choosing among competing parametric distributions for biomarkers. Going “beyond the mean” to estimate tail probabilities, we find that GB2 performs fairly well with some disparities at the very high levels of glycated hemoglobin and fibrinogen. Commonly used linear models are shown to perform worse than almost all the flexible distributions.
- generalized beta of second kind
- heavy tails
- tail probabilities