Abstract
Brazilian universities have included essay writing assignments in the entrance examination procedure to select prospective students. The essay scorers manually look for the presence of required Rhetorical Structure Theory (RST) categories and evaluate essay coherence. However, identifying RST categories is a time-consuming task. The literature reported several attempts to automate the identification of RST categories in essays with machine learning. Still, previous studies have focused on using machine learning algorithms trained on content-dependent features that can diminish classification performance, leading to over-fitting and hindering model generalisability. Therefore, this paper proposes: (i) the analysis of state-of-the-art classifiers and content-independent features to the task of RST rhetorical moves; (ii) a new approach that considers the sequence of the text to extract features - i.e. sequential content-independent features; (iii) an empirical study about the generalisability of the machine learning models and sequential content-independent features for this context; (iv) the identification of the most predictive features for automated identification of RST categories in essays written in Portuguese. The best performing classifier, XGBoost, based on sequential content-independent features, outperformed the classifiers used in the literature and are based on traditional content-dependent features. The XGBoost classifier based on sequential content-independent features also reached promising accuracy when tested for generalisability.
Original language | English |
---|---|
Title of host publication | LAK22 Conference Proceedings |
Editors | Alyssa Friend Wise, Roberto Martinez-Maldonado, Isabel Hilliger |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 404-414 |
Number of pages | 11 |
ISBN (Electronic) | 9781450395731 |
DOIs | |
Publication status | Published - 2022 |
Event | International Conference on Learning Analytics and Knowledge 2022: Learning Analytics for Transition, Disruption and Social Change - Online, United States of America Duration: 21 Mar 2022 → 25 Mar 2022 Conference number: 12th https://dl.acm.org/doi/proceedings/10.1145/3506860 (Proceedings) |
Conference
Conference | International Conference on Learning Analytics and Knowledge 2022 |
---|---|
Abbreviated title | LAK 2022 |
Country/Territory | United States of America |
Period | 21/03/22 → 25/03/22 |
Internet address |
|
Keywords
- content analytics
- context analysis
- Essay analysis
- natural language processing.
- rhetoric structure