Analysing discussion forum data

a replication study avoiding data contamination

Elaine Farrow, Johanna Moore, Dragan Gašević

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

The widespread use of online discussion forums in educational settings provides a rich source of data for researchers interested in how collaboration and interaction can foster effective learning. Such online behaviour can be understood through the Community of Inquiry framework, and the cognitive presence construct in particular can be used to characterise the depth of a student's critical engagement with course material. Automated methods have been developed to support this task, but many studies used small data sets, and there have been few replication studies. In this work, we present findings related to the robustness and generalisability of automated classification methods for detecting cognitive presence in discussion forum transcripts. We closely examined one published state-of-the-art model, comparing different approaches to managing unbalanced classes in the data. By demonstrating how commonly-used data preprocessing practices can lead to over-optimistic results, we contribute to the development of the field so that the results of automated content analysis can be used with confidence.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19)
Subtitle of host publicationLearning Analytics to Promote Inclusion and Success
EditorsChristopher Brooks, Rebecca Ferguson, Ulrich Hoppe
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages170-179
Number of pages10
ISBN (Electronic)9781450362566
DOIs
Publication statusPublished - 2019
EventInternational Learning Analytics & Knowledge Conference 2019 - Arizona State University, Tempe, United States of America
Duration: 4 Mar 20198 Mar 2019
Conference number: 9th
https://lak19.solaresearch.org/

Publication series

NameACM International Conference Proceeding Series

Conference

ConferenceInternational Learning Analytics & Knowledge Conference 2019
Abbreviated titleLAK 2019
CountryUnited States of America
CityTempe
Period4/03/198/03/19
Internet address

Keywords

  • Cognitive presence
  • Community of inquiry
  • Data contamination
  • Replication

Cite this

Farrow, E., Moore, J., & Gašević, D. (2019). Analysing discussion forum data: a replication study avoiding data contamination. In C. Brooks, R. Ferguson, & U. Hoppe (Eds.), Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19): Learning Analytics to Promote Inclusion and Success (pp. 170-179). (ACM International Conference Proceeding Series). New York NY USA: Association for Computing Machinery (ACM). https://doi.org/10.1145/3303772.3303779
Farrow, Elaine ; Moore, Johanna ; Gašević, Dragan. / Analysing discussion forum data : a replication study avoiding data contamination. Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19): Learning Analytics to Promote Inclusion and Success. editor / Christopher Brooks ; Rebecca Ferguson ; Ulrich Hoppe. New York NY USA : Association for Computing Machinery (ACM), 2019. pp. 170-179 (ACM International Conference Proceeding Series).
@inproceedings{4067e67e5e0c47fc9cf75b3d005d8f1a,
title = "Analysing discussion forum data: a replication study avoiding data contamination",
abstract = "The widespread use of online discussion forums in educational settings provides a rich source of data for researchers interested in how collaboration and interaction can foster effective learning. Such online behaviour can be understood through the Community of Inquiry framework, and the cognitive presence construct in particular can be used to characterise the depth of a student's critical engagement with course material. Automated methods have been developed to support this task, but many studies used small data sets, and there have been few replication studies. In this work, we present findings related to the robustness and generalisability of automated classification methods for detecting cognitive presence in discussion forum transcripts. We closely examined one published state-of-the-art model, comparing different approaches to managing unbalanced classes in the data. By demonstrating how commonly-used data preprocessing practices can lead to over-optimistic results, we contribute to the development of the field so that the results of automated content analysis can be used with confidence.",
keywords = "Cognitive presence, Community of inquiry, Data contamination, Replication",
author = "Elaine Farrow and Johanna Moore and Dragan Gašević",
year = "2019",
doi = "10.1145/3303772.3303779",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery (ACM)",
pages = "170--179",
editor = "Christopher Brooks and Rebecca Ferguson and Ulrich Hoppe",
booktitle = "Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19)",
address = "United States of America",

}

Farrow, E, Moore, J & Gašević, D 2019, Analysing discussion forum data: a replication study avoiding data contamination. in C Brooks, R Ferguson & U Hoppe (eds), Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19): Learning Analytics to Promote Inclusion and Success. ACM International Conference Proceeding Series, Association for Computing Machinery (ACM), New York NY USA, pp. 170-179, International Learning Analytics & Knowledge Conference 2019, Tempe, United States of America, 4/03/19. https://doi.org/10.1145/3303772.3303779

Analysing discussion forum data : a replication study avoiding data contamination. / Farrow, Elaine; Moore, Johanna; Gašević, Dragan.

Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19): Learning Analytics to Promote Inclusion and Success. ed. / Christopher Brooks; Rebecca Ferguson; Ulrich Hoppe. New York NY USA : Association for Computing Machinery (ACM), 2019. p. 170-179 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Analysing discussion forum data

T2 - a replication study avoiding data contamination

AU - Farrow, Elaine

AU - Moore, Johanna

AU - Gašević, Dragan

PY - 2019

Y1 - 2019

N2 - The widespread use of online discussion forums in educational settings provides a rich source of data for researchers interested in how collaboration and interaction can foster effective learning. Such online behaviour can be understood through the Community of Inquiry framework, and the cognitive presence construct in particular can be used to characterise the depth of a student's critical engagement with course material. Automated methods have been developed to support this task, but many studies used small data sets, and there have been few replication studies. In this work, we present findings related to the robustness and generalisability of automated classification methods for detecting cognitive presence in discussion forum transcripts. We closely examined one published state-of-the-art model, comparing different approaches to managing unbalanced classes in the data. By demonstrating how commonly-used data preprocessing practices can lead to over-optimistic results, we contribute to the development of the field so that the results of automated content analysis can be used with confidence.

AB - The widespread use of online discussion forums in educational settings provides a rich source of data for researchers interested in how collaboration and interaction can foster effective learning. Such online behaviour can be understood through the Community of Inquiry framework, and the cognitive presence construct in particular can be used to characterise the depth of a student's critical engagement with course material. Automated methods have been developed to support this task, but many studies used small data sets, and there have been few replication studies. In this work, we present findings related to the robustness and generalisability of automated classification methods for detecting cognitive presence in discussion forum transcripts. We closely examined one published state-of-the-art model, comparing different approaches to managing unbalanced classes in the data. By demonstrating how commonly-used data preprocessing practices can lead to over-optimistic results, we contribute to the development of the field so that the results of automated content analysis can be used with confidence.

KW - Cognitive presence

KW - Community of inquiry

KW - Data contamination

KW - Replication

UR - http://www.scopus.com/inward/record.url?scp=85062777789&partnerID=8YFLogxK

U2 - 10.1145/3303772.3303779

DO - 10.1145/3303772.3303779

M3 - Conference Paper

T3 - ACM International Conference Proceeding Series

SP - 170

EP - 179

BT - Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19)

A2 - Brooks, Christopher

A2 - Ferguson, Rebecca

A2 - Hoppe, Ulrich

PB - Association for Computing Machinery (ACM)

CY - New York NY USA

ER -

Farrow E, Moore J, Gašević D. Analysing discussion forum data: a replication study avoiding data contamination. In Brooks C, Ferguson R, Hoppe U, editors, Proceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19): Learning Analytics to Promote Inclusion and Success. New York NY USA: Association for Computing Machinery (ACM). 2019. p. 170-179. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3303772.3303779