LearningQ: a large-scale dataset for educational question generation

Guanliang Chen, Jie Yang, Claudia Hauff, Geert Jan Houben

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

69 Citations (Scopus)

Abstract

We present LearningQ, a challenging educational question generation dataset containing over 230K document-question pairs. It includes 7K instructor-designed questions assessing knowledge concepts being taught and 223K learner-generated questions seeking in-depth understanding of the taught concepts. We show that, compared to existing datasets that can be used to generate educational questions, LearningQ (i) covers a wide range of educational topics and (ii) contains long and cognitively demanding documents for which question generation requires reasoning over the relationships between sentences and paragraphs. As a result, a significant percentage of LearningQ questions (∼30%) require higher-order cognitive skills to solve (such as applying, analyzing), in contrast to existing question-generation datasets that are designed mostly for the lowest cognitive skill level (i.e. remembering). To understand the effectiveness of existing question generation methods in producing educational questions, we evaluate both rule-based and deep neural network based methods on LearningQ. Extensive experiments show that state-of-the-art methods which perform well on existing datasets cannot generate useful educational questions. This implies that LearningQ is a challenging test bed for the generation of high-quality educational questions and worth further investigation. We open-source the dataset and our codes at https://dataverse.mpi-sws.org/dataverse/icwsm18.

Original languageEnglish
Title of host publicationProceedings of the Twelfth International Conference on Web and Social Media
EditorsKate Starbird, Ingmar Weber
Place of PublicationMarina del Rey CA USA
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Pages481-490
Number of pages10
ISBN (Electronic)9781577357988
Publication statusPublished - 2018
Externally publishedYes
EventInternational AAAI Conference on Weblogs and Social Media 2018 - Palo Alto, United States of America
Duration: 25 Jun 201828 Jun 2018
Conference number: 12th
https://icwsm.org/2018/

Conference

ConferenceInternational AAAI Conference on Weblogs and Social Media 2018
Abbreviated titleICWSM 2018
Country/TerritoryUnited States of America
CityPalo Alto
Period25/06/1828/06/18
Internet address

Keywords

  • Automatic Question Generation
  • Deep Neural Network
  • Human Learning
  • Bloom's Revised Taxonomy

Cite this