Quantifying sentence acceptability measures: reliability, bias, and variability

Steven Langsford, Amy Perfors, Andrew T. Hendrickson, Lauren A. Kennedy, Danielle J. Navarro

Research output: Contribution to journalArticleResearchpeer-review


Understanding and measuring sentence acceptability is of fundamental importance for linguists, but although many measures for doing so have been developed, relatively little is known about some of their psychometric properties. In this paper we evaluate within- and between-participant test-retest reliability on a wide range of measures of sentence acceptability. Doing so allows us to estimate how much of the variability within each measure is due to factors including participant-level individual differences, sample size, response styles, and item effects. The measures examined include Likert scales, two versions of forced-choice judgments, magnitude estimation, and a novel measure based on Thurstonian approaches in psychophysics. We reproduce previous findings of high between-participant reliability within and across measures, and extend these results to a generally high reliability within individual items and individual people. Our results indicate that Likert scales and the Thurstonian approach produce the most stable and reliable acceptability measures and do so with smaller sample sizes than the other measures. Moreover, their agreement with each other suggests that the limitation of a discrete Likert scale does not impose a significant degree of structure on the resulting acceptability judgments.
Original languageEnglish
Pages (from-to)1-34
Number of pages34
JournalGlossa: A Journal of General Linguistics
Issue number1
Publication statusPublished - 2018
Externally publishedYes


  • syntax
  • acceptability
  • measurement
  • representation
  • reliability
  • power
  • sample size

Cite this