Towards automatic content analysis of social presence in transcripts of online discussions

Máverick Ferreira, Vitor Rolim, Rafael Ferreira Mello, Rafael Dueire Lins, Guanliang Chen, Dragan Gaševic

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

29 Citations (Scopus)


This paper presents an approach to automatic labeling of the content of messages in online discussion according to the categories of social presence. To achieve this goal, the proposed approach is based on a combination of traditional text mining features and word counts extracted with the use of established linguistic frameworks (i.e., LIWC and Coh-metrix). The best performing classifier obtained 0.95 and 0.88 for accuracy and Cohen's kappa, respectively. This paper also provides some theoretical insights into the nature of social presence by looking at the classification features that were most relevant for distinguishing between the different categories. Finally, this study adopted epistemic network analysis to investigate the structural construct validity of the automatic classification approach. Namely, the analysis showed that the epistemic networks produced based on messages manually and automatically coded produced nearly identical results. This finding thus produced evidence of the structural validity of the automatic approach.

Original languageEnglish
Title of host publicationLAK20 Conference Proceedings
EditorsMaren Scheffel, Vitomir Kovanović, Niels Pinkwart, Katrien Verbert
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages10
ISBN (Electronic)9781450377126
Publication statusPublished - 2020
EventInternational Learning Analytics & Knowledge Conference 2020 - Frankfurt, Germany
Duration: 23 Mar 202027 Mar 2020
Conference number: 10th (Website) (Website)


ConferenceInternational Learning Analytics & Knowledge Conference 2020
Abbreviated titleLAK 2020
Internet address


  • Community of Inquiry Model
  • Content Analytics
  • Epistemic Network Analysis
  • Online Discussion
  • Text Classification

Cite this