Assessing annotated corpora as research output

Nick Thieberger, Anna Margetts, Stephen Morey, Simon Musgrave

Research output: Contribution to journalArticleResearchpeer-review

19 Citations (Scopus)


The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing digital corpora of language data for long-term access. Benefits of this development include: (i) making analyses accountable to the primary material on which they are based; (ii) providing future researchers with a body of linguistic material to analyse in ways not foreseen by the original collector of the data; and, equally importantly, (iii) acknowledging the responsibility of the linguist to create records that can be accessed by the speakers of the language and by their descendants. Preparing such data collections requires substantial scholarly effort, and in order to make this approach sustainable, those who undertake it need to receive appropriate academic recognition of their effort in relevant institutional contexts. Such recognition is especially important for early-career scholars so that they can devote efforts to the compilation of annotated corpora and to making them accessible without damaging their careers in the long-term by impacting negatively on their publication record. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) made it clear that the ARC accepts that curated corpora can legitimately be seen as research output, but that it is the responsibility of the ALS (and the scholarly community more generally) to establish conventions to accord scholarly credibility to such research products. This paper reports on the activities of the authors in exploring this issue on behalf of the ALS and it discusses issues in two areas: (a) what sort of process is appropriate in according acknowledgment and validation to curated corpora as research output; and (b) what are the appropriate criteria against which such validation should be judged? While the discussion focuses on the Australian linguistic context, it is also more broadly applicable as we will present in this article. (c) 2015 The Australian Linguistic Society.
Original languageEnglish
Pages (from-to)1 - 21
Number of pages21
JournalAustralian Journal of Linguistics
Issue number1
Publication statusPublished - 2016


  • primary data curation
  • citation of data
  • valuing collections

Cite this