Multilingual neural machine translation with soft decoupled encoding

Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Multilingual training of neural machine translation (NMT) systems has led to impressive accuracy improvements on low-resource languages. However, there are still significant challenges in efficiently learning word representations in the face of paucity of data. In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-level information intelligently without requiring heuristic preprocessing such as pre-segmenting the data. SDE represents a word by its spelling through a character encoding, and its semantic meaning through a latent embedding space shared by all languages. Experiments on a standard dataset of four low-resource languages show consistent improvements over strong multilingual NMT baselines, with gains of up to 2 BLEU on one of the tested languages, achieving the new state-of-the-art on all four language pairs.

Original languageEnglish
Title of host publicationInternational Conference on Learning Representations 2019
EditorsAlexander Rush
Place of PublicationLa Jolla CA USA
PublisherInternational Conference on Learning Representations (ICLR)
Number of pages13
ISBN (Print)9783800743629
Publication statusPublished - 2019
EventInternational Conference on Learning Representations 2019 - New Orleans, United States of America
Duration: 6 May 20199 May 2019
https://iclr.cc/

Conference

ConferenceInternational Conference on Learning Representations 2019
Abbreviated titleICLR 2019
CountryUnited States of America
CityNew Orleans
Period6/05/199/05/19
Internet address

Cite this