Analysing discussion forum data: a replication study avoiding data contamination

Elaine Farrow, Johanna Moore, Dragan Gašević

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    18 Citations (Scopus)


    The widespread use of online discussion forums in educational settings provides a rich source of data for researchers interested in how collaboration and interaction can foster effective learning. Such online behaviour can be understood through the Community of Inquiry framework, and the cognitive presence construct in particular can be used to characterise the depth of a student's critical engagement with course material. Automated methods have been developed to support this task, but many studies used small data sets, and there have been few replication studies. In this work, we present findings related to the robustness and generalisability of automated classification methods for detecting cognitive presence in discussion forum transcripts. We closely examined one published state-of-the-art model, comparing different approaches to managing unbalanced classes in the data. By demonstrating how commonly-used data preprocessing practices can lead to over-optimistic results, we contribute to the development of the field so that the results of automated content analysis can be used with confidence.

    Original languageEnglish
    Title of host publicationProceedings of the 9th International Conference on Learning Analytics and Knowledge (LAK'19)
    Subtitle of host publicationLearning Analytics to Promote Inclusion and Success
    EditorsChristopher Brooks, Rebecca Ferguson, Ulrich Hoppe
    Place of PublicationNew York NY USA
    PublisherAssociation for Computing Machinery (ACM)
    Number of pages10
    ISBN (Electronic)9781450362566
    Publication statusPublished - 2019
    EventInternational Learning Analytics & Knowledge Conference 2019 - Arizona State University, Tempe, United States of America
    Duration: 4 Mar 20198 Mar 2019
    Conference number: 9th

    Publication series

    NameACM International Conference Proceeding Series


    ConferenceInternational Learning Analytics & Knowledge Conference 2019
    Abbreviated titleLAK 2019
    Country/TerritoryUnited States of America
    Internet address


    • Cognitive presence
    • Community of inquiry
    • Data contamination
    • Replication

    Cite this