Expanding Malaysian English Dataset with Human-in-the-Loop Annotation for Entity and Relation Recognition

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

Abstract

Malaysian English, being a low-resource creole language, presents unique challenges for natural language processing tasks such as Named Entity Recognition (NER) and Relational Extraction (RE). In this paper, we propose a methodology utilizing Human-in-the-Loop (HITL) Annotation to address these challenges and enhance the annotation process for NER and RE tasks in Malaysian English. By implementing this methodology, we effectively expanded the MEN Dataset from 6,061 entities to 12,456 entities and from 4,095 relation instances to 7,794 relation instances. This promising outcome serves as an encouragement to expand resources for any low-resource language by implementing the discussed methodology.

Original languageEnglish
Title of host publicationProceedings of 2025 International Conference on Asian Language Processing, IALP 2025
EditorsLei Wang, Rong Tong, Sarah Flora Samson Juan, Yanfeng Lu, Ping Ping Tan, Suhaila Saee, Minghui Dong
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages165-169
Number of pages5
ISBN (Electronic)9798331589790
ISBN (Print)9798331589806
DOIs
Publication statusPublished - 2025
EventInternational Conference on Asian Language Processing (IALP) 2025 - Sarawak, Malaysia
Duration: 4 Aug 20256 Aug 2025
Conference number: 29th
https://ieeexplore.ieee.org/xpl/conhome/11156192/proceeding (Proceedings)
https://www.colips.org/conferences/ialp2025/wp/ (Website)

Conference

ConferenceInternational Conference on Asian Language Processing (IALP) 2025
Abbreviated titleIALP 2025
Country/TerritoryMalaysia
CitySarawak
Period4/08/256/08/25
Internet address

Keywords

  • Human-in-the-Loop Annotation
  • Low-Resource Language
  • Malaysian English
  • Named Entity Recognition
  • Relation Extraction

Cite this