Abstract
Neural Machine Translation (NMT) is notorious for its need for large amounts of
bilingual data. An effective approach to compensate for this requirement is Multi-Task Learning (MTL) to leverage different linguistic resources as a source of inductive bias. Current MTL architectures are based on the SEQ2SEQ transduction, and (partially) share different components of the models among the tasks. However, this MTL approach often suffers from task interference, and is not able to fully capture commonalities among subsets of tasks. We address this issue by extending the recurrent units with multiple blocks along with a trainable routing network. The routing network enables adaptive collaboration by dynamic sharing of blocks conditioned on the task at hand, input, and model state. Empirical evaluation of two low-resource translation tasks, English to Vietnamese and Farsi, show +1 BLEU score improvements compared to strong baselines.
bilingual data. An effective approach to compensate for this requirement is Multi-Task Learning (MTL) to leverage different linguistic resources as a source of inductive bias. Current MTL architectures are based on the SEQ2SEQ transduction, and (partially) share different components of the models among the tasks. However, this MTL approach often suffers from task interference, and is not able to fully capture commonalities among subsets of tasks. We address this issue by extending the recurrent units with multiple blocks along with a trainable routing network. The routing network enables adaptive collaboration by dynamic sharing of blocks conditioned on the task at hand, input, and model state. Empirical evaluation of two low-resource translation tasks, English to Vietnamese and Farsi, show +1 BLEU score improvements compared to strong baselines.
Original language | English |
---|---|
Title of host publication | ACL 2018 - The 56th Annual Meeting of the Association for Computational Linguistics |
Subtitle of host publication | Proceedings of the Conference, Vol. 2 (Short Papers) |
Editors | Iryna Gurevych, Yusuke Miyao |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 656-661 |
Number of pages | 6 |
ISBN (Print) | 9781948087346 |
Publication status | Published - 2018 |
Event | Annual Meeting of the Association of Computational Linguistics 2018 - Melbourne, Australia Duration: 15 Jul 2018 → 20 Jul 2018 Conference number: 56th https://aclanthology.info/events/acl-2018 |
Conference
Conference | Annual Meeting of the Association of Computational Linguistics 2018 |
---|---|
Abbreviated title | ACL 2018 |
Country/Territory | Australia |
City | Melbourne |
Period | 15/07/18 → 20/07/18 |
Internet address |