Towards automatic cross-language classification of cognitive presence in online discussions

Gian Barbosa, Raissa Camelo, Anderson Pinheiro Cavalcanti, Péricles Miranda, Rafael Ferreira Mello, Vitomir Kovanovic, Dragan Gaševic

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)


This paper presents a study that examined automated cross-language classification of online discussion messages for the levels of cognitive presence, a key construct from the widely used Community of Inquiry (CoI) model of online learning. Specifically, we examined the classification of 1,500 Portuguese language discussion messages using a classifier trained on a corpus of the 1,747 English language discussion messages. In the study, a random forest classifier was developed using a small set of 108 validated indicators of psychological processes, linguistic coherence, and online discussion structure. The classifier obtained 67% accuracy and Cohen's κ of 0.32, showing a moderate level of inter-rater agreement above chance and the general viability of the proposed approach. Most importantly, the findings suggest that certain aspects of cognitive presence construct are highly generalizable and transfer across different languages. Finally, the paper also presents a novel method for addressing class imbalance problem using a generic algorithm heuristic technique, which provided substantial improvements over the use of imbalanced dataset. Results and practical implications are further discussed.

Original languageEnglish
Title of host publicationLAK 2020 Conference Proceedings
EditorsVitomir Kovanović, Maren Scheffel, Niels Pinkwart, Verbert Verbert
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages10
ISBN (Electronic)9781450377126
Publication statusPublished - 2020
EventInternational Conference on Learning Analytics and Knowledge 2020 - Frankfurt, Germany
Duration: 23 Mar 202027 Mar 2020
Conference number: 10th (Website) (Website)


ConferenceInternational Conference on Learning Analytics and Knowledge 2020
Abbreviated titleLAK 2020
Internet address


  • Community of Inquiry Model
  • Content Analytics
  • Cross-Language Classification
  • Online Discussion
  • Optimization

Cite this