A neural machine translation approach for translating Malay parliament Hansard to English text

Yu Zane Low, Lay-Ki Soon, Shageenderan Sapai

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

6 Citations (Scopus)

Abstract

Parliament Hansard is one of the most precious texts made available to the public. In Malaysia, the parliament Hansard records the debate and discussions in Malay language. Topic modelling, sentiment analysis, relation extractions, trend prediction or temporal analyses are frequently applied on parliament Hansard to discover interesting patterns. However, most of the matured tools for such processing tasks work on English text. As such, before the Malaysian parliament Hansard can be further processed, it is essential to translate the Malay text into English. Several machine translation approaches have been surveyed in this paper. From the literature review, neural machine translation, particularly the Transformer Model has been proven to provide promising results in translating different languages. In this paper, we present our implementation of neural machine translation for Malay to English text. The experimental design shows that with a good set of parallel corpus and minimal fine-tuning, neural MT can achieve as high as 35.42 in BLEU score.

Original languageEnglish
Title of host publication2020 International Conference on Asian Language Processing (IALP)
EditorsYanfeng Lu, Minghui Dong, Lay-Ki Soon, Keng Hoon Gan
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages316-320
Number of pages5
ISBN (Electronic)9781728176895
ISBN (Print)9781728176901
DOIs
Publication statusPublished - 2020
EventInternational Conference on Asian Language Processing (IALP) 2020 - Kuala Lumpur, Malaysia
Duration: 4 Dec 20206 Dec 2020
https://ieeexplore.ieee.org/xpl/conhome/9310452/proceeding (Proceedings)

Conference

ConferenceInternational Conference on Asian Language Processing (IALP) 2020
Abbreviated titleIALP 2020
Country/TerritoryMalaysia
CityKuala Lumpur
Period4/12/206/12/20
Internet address

Keywords

  • Machine translation
  • neural machine translation
  • parliament hansard

Cite this