Reducing the size of training datasets in the classification of online discussions

Vitor Rolim, Rafael Ferreira Mello, Andre Nascimento, Rafael Dueire Lins, Dragan Gasevic

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

Supervised machine learning models have been widely used to address the classification of messages in online discussions. Supervised learning algorithms require a large set of annotated data to accurately create a predictive model. However, data annotation is a complex task due to three factors: (i) depends on specialists to accurately label data; (ii) it is often a time-consuming and labour-intensive work, and(iii) in educational settings, it is not always easy to collect a substantial volume of data required by the machine learning algorithms. This paper presents an active learning-based approach that can reduce the amount of annotated data required to build machine learning models for the classification of educational data. The results obtained show that with only 20% of the annotated data, the proposed approach achieved similar results to those presented in the previous works that used the complete databases to train the machine learning model.

Original languageEnglish
Title of host publicationProceedings - IEEE 21st International Conference on Advanced Learning Technologies, ICALT 2021
EditorsMaiga Chang, Nian-Shing Chen, Demetrios G Sampson, Ahmed Tlili
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages179-183
Number of pages5
ISBN (Electronic)9781665441063
ISBN (Print)9781665431163
DOIs
Publication statusPublished - 2021
EventIEEE International Conference on Advanced Learning Technologies 2021 - Online, Malaysia
Duration: 12 Jul 202115 Jul 2021
Conference number: 21st
https://ieeexplore-ieee-org.ezproxy.lib.monash.edu.au/xpl/conhome/9499715/proceeding (Proceedings)

Publication series

NameProceedings - IEEE 21st International Conference on Advanced Learning Technologies, ICALT 2021
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISSN (Print)2161-3761
ISSN (Electronic)2161-377X

Conference

ConferenceIEEE International Conference on Advanced Learning Technologies 2021
Abbreviated title ICALT 2021
Country/TerritoryMalaysia
CityOnline
Period12/07/2115/07/21
Internet address

Keywords

  • Active Learning
  • Community of Inquiry
  • Online Discussions
  • Text Classification

Cite this