Towards explainable prediction of essay cohesion in Portuguese and English

Hilario Oliveira, Rafael Ferreira Mello, Bruno Alexandre Barreiros Rosa, Mladen Rakovic, Pericles Miranda, Thiago Cordeiro, Seiji Isotani, Ig Bittencourt, Dragan Gasevic

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)


Textual cohesion is an essential aspect of a formally written text, related to linguistic mechanisms that connect elements such as words, sentences, and paragraphs. Several studies have proposed approaches to estimate textual cohesion in essays automatically. There is limited research that aims to study the extent to which the use of machine learning approaches can predict the textual cohesion of essays written in different languages (not just English). This paper reports on the findings of a study that aimed to propose and evaluate approaches that automatically estimate the cohesion of essays in Portuguese and English. The study proposed regression-based models grounded in conventional feature-based machine learning methods and deep learning-based pre-trained language models. The study also examined the explainability of automated approaches to scrutinize their predictions. We analyzed two datasets composed of 4,570 (Portuguese) and 7,101 (English) essays. The results demonstrate that a deep learning-based model achieved the best performance on both datasets with a moderate Pearson correlation with human-rated cohesion scores. However, the explainability of the automatic cohesion estimations based on conventional machine learning models offered a stronger potential than that of the deep learning model.

Original languageEnglish
Title of host publicationLAK 2023 Conference Proceedings - Towards Trustworthy Learning Analytics - The Thirteenth International Conference on Learning Analytics & Knowledge
EditorsIsabel Hilliger, Hassan Khosravi, Bart Rienties, Shane Dawson
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages11
ISBN (Electronic)9781450398657
Publication statusPublished - 2023
EventInternational Conference on Learning Analytics and Knowledge 2023 - Arlington, United States of America
Duration: 13 Mar 202317 Mar 2023
Conference number: 13th (Proceedings) (Website)


ConferenceInternational Conference on Learning Analytics and Knowledge 2023
Abbreviated titleLAK 2023
Country/TerritoryUnited States of America
Internet address


  • Essay analysis
  • explainable artificial intelligence
  • regression models
  • textual cohesion

Cite this