Abstract
Traditional active learning (AL) methods for machine translation (MT) rely on heuristics. However, these heuristics are limited when the characteristics of the MT problem change due to e.g. the language pair or the amount of the initial bitext. In this paper, we present a framework to learn sentence selection strategies for neural MT. We train the AL query strategy using a high-resource language-pair based on AL simulations, and then transfer it to the low-resource language-pair of interest. The learned query strategy capitalizes on the shared characteristics between the language pairs to make an effective use of the AL budget. Our experiments on three language-pairs confirms that our method is more effective than strong heuristic-based methods in various conditions, including cold-start and warm-start as well as small and extremely small data conditions.
Original language | English |
---|---|
Title of host publication | CoNLL 2018 - The 22nd Conference on Computational Natural Language Learning - Proceedings of the Conference |
Editors | Miikka Silfverberg |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 334-344 |
Number of pages | 11 |
ISBN (Electronic) | 9781948087728 |
Publication status | Published - 2018 |
Event | Conference on Natural Language Learning 2018 - Brussels, Belgium Duration: 31 Oct 2018 → 1 Nov 2018 Conference number: 22nd https://www.conll.org/2018 https://www.aclweb.org/anthology/volumes/K18-1/ (Proceedings) |
Publication series
Name | CoNLL 2018 - 22nd Conference on Computational Natural Language Learning, Proceedings |
---|
Conference
Conference | Conference on Natural Language Learning 2018 |
---|---|
Abbreviated title | CoNLL 2018 |
Country | Belgium |
City | Brussels |
Period | 31/10/18 → 1/11/18 |
Internet address |