CrudeOilNews: an annotated Crude Oil News corpus for event extraction

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

6 Citations (Scopus)

Abstract

In this paper, we present CrudeOilNews, a corpus of English Crude Oil news for event extraction. It is the first of its kind for Commodity News and serves to contribute towards resource building for economic and financial text mining. This paper describes the data collection process, the annotation methodology, and the event typology used in producing the corpus. Firstly, a seed set of 175 news articles were manually annotated, of which a subset of 25 news was used as the adjudicated reference test set for inter-annotator and system evaluation. The inter-annotator agreement was generally substantial, and annotator performance was adequate, indicating that the annotation scheme produces consistent event annotations of high quality. Subsequently, the dataset is expanded through (1) data augmentation and (2) Human-in-the-loop active learning. The resulting corpus has 425 news articles with approximately 11k events annotated. As part of the active learning process, the corpus was used to train basic event extraction models for machine labeling; the resulting models also serve as a validation or as a pilot study demonstrating the use of the corpus in machine learning purposes. The annotated corpus is made available for academic research purpose at https://github.com/meisin/CrudeOilNews-Corpus.

Original languageEnglish
Title of host publicationLanguage Resources and Evaluation Conference, LREC 2022
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Jan Odijk, Stelios Piperidis
Place of PublicationParis France
PublisherEuropean Language Resources Association (ELRA)
Pages465-479
Number of pages15
ISBN (Electronic)9791095546726
Publication statusPublished - 2022
EventInternational Conference on Language Resources and Evaluation Conference 2022 - Marseille, France
Duration: 20 Jun 202225 Jun 2022
Conference number: 13th
https://aclanthology.org/volumes/2022.lrec-1/ (Proceedings)

Conference

ConferenceInternational Conference on Language Resources and Evaluation Conference 2022
Abbreviated titleLREC 2022
Country/TerritoryFrance
CityMarseille
Period20/06/2225/06/22
Internet address

Keywords

  • Annotated Dataset
  • Crude Oil News
  • English corpus
  • Event Extraction
  • Financial Information Extraction

Cite this