Assessing annotated corpora as research output

Nick Thieberger, Anna Margetts, Stephen Morey, Simon Musgrave

Research output: Contribution to journalArticleResearchpeer-review

6 Citations (Scopus)

Abstract

The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing digital corpora of language data for long-term access. Benefits of this development include: (i) making analyses accountable to the primary material on which they are based; (ii) providing future researchers with a body of linguistic material to analyse in ways not foreseen by the original collector of the data; and, equally importantly, (iii) acknowledging the responsibility of the linguist to create records that can be accessed by the speakers of the language and by their descendants. Preparing such data collections requires substantial scholarly effort, and in order to make this approach sustainable, those who undertake it need to receive appropriate academic recognition of their effort in relevant institutional contexts. Such recognition is especially important for early-career scholars so that they can devote efforts to the compilation of annotated corpora and to making them accessible without damaging their careers in the long-term by impacting negatively on their publication record. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) made it clear that the ARC accepts that curated corpora can legitimately be seen as research output, but that it is the responsibility of the ALS (and the scholarly community more generally) to establish conventions to accord scholarly credibility to such research products. This paper reports on the activities of the authors in exploring this issue on behalf of the ALS and it discusses issues in two areas: (a) what sort of process is appropriate in according acknowledgment and validation to curated corpora as research output; and (b) what are the appropriate criteria against which such validation should be judged? While the discussion focuses on the Australian linguistic context, it is also more broadly applicable as we will present in this article. (c) 2015 The Australian Linguistic Society.
Original languageEnglish
Pages (from-to)1 - 21
Number of pages21
JournalAustralian Journal of Linguistics
Volume36
Issue number1
DOIs
Publication statusPublished - 2016

Keywords

  • primary data curation
  • citation of data
  • valuing collections

Cite this

Thieberger, Nick ; Margetts, Anna ; Morey, Stephen ; Musgrave, Simon. / Assessing annotated corpora as research output. In: Australian Journal of Linguistics. 2016 ; Vol. 36, No. 1. pp. 1 - 21.
@article{75d693b3f5c1481e8eadc3a47462e144,
title = "Assessing annotated corpora as research output",
abstract = "The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing digital corpora of language data for long-term access. Benefits of this development include: (i) making analyses accountable to the primary material on which they are based; (ii) providing future researchers with a body of linguistic material to analyse in ways not foreseen by the original collector of the data; and, equally importantly, (iii) acknowledging the responsibility of the linguist to create records that can be accessed by the speakers of the language and by their descendants. Preparing such data collections requires substantial scholarly effort, and in order to make this approach sustainable, those who undertake it need to receive appropriate academic recognition of their effort in relevant institutional contexts. Such recognition is especially important for early-career scholars so that they can devote efforts to the compilation of annotated corpora and to making them accessible without damaging their careers in the long-term by impacting negatively on their publication record. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) made it clear that the ARC accepts that curated corpora can legitimately be seen as research output, but that it is the responsibility of the ALS (and the scholarly community more generally) to establish conventions to accord scholarly credibility to such research products. This paper reports on the activities of the authors in exploring this issue on behalf of the ALS and it discusses issues in two areas: (a) what sort of process is appropriate in according acknowledgment and validation to curated corpora as research output; and (b) what are the appropriate criteria against which such validation should be judged? While the discussion focuses on the Australian linguistic context, it is also more broadly applicable as we will present in this article. (c) 2015 The Australian Linguistic Society.",
keywords = "primary data curation, citation of data, valuing collections",
author = "Nick Thieberger and Anna Margetts and Stephen Morey and Simon Musgrave",
year = "2016",
doi = "10.1080/07268602.2016.1109428",
language = "English",
volume = "36",
pages = "1 -- 21",
journal = "Australian Journal of Linguistics",
issn = "0726-8602",
publisher = "Taylor & Francis",
number = "1",

}

Assessing annotated corpora as research output. / Thieberger, Nick; Margetts, Anna; Morey, Stephen; Musgrave, Simon.

In: Australian Journal of Linguistics, Vol. 36, No. 1, 2016, p. 1 - 21.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Assessing annotated corpora as research output

AU - Thieberger, Nick

AU - Margetts, Anna

AU - Morey, Stephen

AU - Musgrave, Simon

PY - 2016

Y1 - 2016

N2 - The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing digital corpora of language data for long-term access. Benefits of this development include: (i) making analyses accountable to the primary material on which they are based; (ii) providing future researchers with a body of linguistic material to analyse in ways not foreseen by the original collector of the data; and, equally importantly, (iii) acknowledging the responsibility of the linguist to create records that can be accessed by the speakers of the language and by their descendants. Preparing such data collections requires substantial scholarly effort, and in order to make this approach sustainable, those who undertake it need to receive appropriate academic recognition of their effort in relevant institutional contexts. Such recognition is especially important for early-career scholars so that they can devote efforts to the compilation of annotated corpora and to making them accessible without damaging their careers in the long-term by impacting negatively on their publication record. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) made it clear that the ARC accepts that curated corpora can legitimately be seen as research output, but that it is the responsibility of the ALS (and the scholarly community more generally) to establish conventions to accord scholarly credibility to such research products. This paper reports on the activities of the authors in exploring this issue on behalf of the ALS and it discusses issues in two areas: (a) what sort of process is appropriate in according acknowledgment and validation to curated corpora as research output; and (b) what are the appropriate criteria against which such validation should be judged? While the discussion focuses on the Australian linguistic context, it is also more broadly applicable as we will present in this article. (c) 2015 The Australian Linguistic Society.

AB - The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing digital corpora of language data for long-term access. Benefits of this development include: (i) making analyses accountable to the primary material on which they are based; (ii) providing future researchers with a body of linguistic material to analyse in ways not foreseen by the original collector of the data; and, equally importantly, (iii) acknowledging the responsibility of the linguist to create records that can be accessed by the speakers of the language and by their descendants. Preparing such data collections requires substantial scholarly effort, and in order to make this approach sustainable, those who undertake it need to receive appropriate academic recognition of their effort in relevant institutional contexts. Such recognition is especially important for early-career scholars so that they can devote efforts to the compilation of annotated corpora and to making them accessible without damaging their careers in the long-term by impacting negatively on their publication record. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) made it clear that the ARC accepts that curated corpora can legitimately be seen as research output, but that it is the responsibility of the ALS (and the scholarly community more generally) to establish conventions to accord scholarly credibility to such research products. This paper reports on the activities of the authors in exploring this issue on behalf of the ALS and it discusses issues in two areas: (a) what sort of process is appropriate in according acknowledgment and validation to curated corpora as research output; and (b) what are the appropriate criteria against which such validation should be judged? While the discussion focuses on the Australian linguistic context, it is also more broadly applicable as we will present in this article. (c) 2015 The Australian Linguistic Society.

KW - primary data curation

KW - citation of data

KW - valuing collections

UR - http://www.tandfonline.com/doi/pdf/10.1080/07268602.2016.1109428

U2 - 10.1080/07268602.2016.1109428

DO - 10.1080/07268602.2016.1109428

M3 - Article

VL - 36

SP - 1

EP - 21

JO - Australian Journal of Linguistics

JF - Australian Journal of Linguistics

SN - 0726-8602

IS - 1

ER -