CANarEx: Contextually Aware Narrative Extraction for Semantically Rich Text-as-data applications

Nandini Anantharama, Simon D. Angus, Lachlan O'Neill

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


Narrative modelling is an area of active research, motivated by the acknowledgement of narratives as drivers of societal decision making. These research efforts conceptualize narratives as connected entity chains, and modeling typically focuses on the identification of entities and their connections within a text. An emerging approach to narrative modelling is the use of semantic role labeling (SRL) to extract Entity-Verb-Entity (E-V-Es) tuples from a text, followed by dimensionality reduction to reduce the space of entities and connections separately. This process penalises the semantic richness of narratives and discards much contextual information along the way. Here, we propose an alternate narrative extraction approach - CANarEx, incorporating a pipeline of common contextual constructs through co-reference resolution, micro-narrative generation and clustering of these narratives through sentence embeddings. We evaluate our approach through testing the recovery of “narrative time-series clusters”, mimicking a desirable text-as-data task. The evaluation framework leverages synthetic data generated using a GPT-3 model. The GPT-3 model is trained to generate similar sentences using a large dataset of news articles. The synthetic data maps to three topics in the news dataset. We then generate narrative time-series document cluster representations by mapping the synthetic data to three distinct signals synthetically injected into the testing corpus. Evaluation results demonstrate the superior ability of CANarEx to recover narrative time-series through reduced MSE and improved precision/recall relative to existing methods. The validity is further reinforced through ablation studies and qualitative analysis.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2022
EditorsYoav Goldberg, Zornitsa Kozareva, Yue Zhang
Place of PublicationAbu Dhabi United Arab Emirates
PublisherAssociation for Computational Linguistics (ACL)
Number of pages14
Publication statusPublished - 2022

Cite this