Abstract
In recent years neural language models (LMs) have set state-of-the-art performance for several benchmarking datasets. While the reasons for their success and their computational demand are well-documented, a comparison between neural models and more recent developments in n-gram models is neglected. In this paper, we examine the recent progress in n-gram literature, running experiments on 50 languages covering all morphological language families. Experimental results illustrate that a simple extension of Modified Kneser-Ney outperforms an LSTM language model on 42 languages while a word-level Bayesian n-gram LM (Shareghi et al., 2017) outperforms the character-aware neural model (Kim et al., 2016) on average across all languages, and its extension which explicitly injects linguistic knowledge (Gerz et al., 2018a) on 8 languages. Further experiments on larger Europarl datasets for 3 languages indicate that neural architectures are able to outperform computationally much cheaper n-gram models: n-gram training is up to 15, 000× quicker. Our experiments illustrate that standalone n-gram models lend themselves as natural choices for resource-lean or morphologically rich languages, while the recent progress has significantly improved their accuracy.
Original language | English |
---|---|
Title of host publication | NAACL 2019, The 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies |
Subtitle of host publication | Proceedings of the Conference Vol. 1 (Long and Short Papers), June 2 - June 7, 2019 |
Editors | Christy Doran, Thamar Solorio |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4113-4118 |
Number of pages | 6 |
Volume | 1 |
ISBN (Electronic) | 9781950737130 |
DOIs | |
Publication status | Published - Jun 2019 |
Event | North American Association for Computational Linguistics 2019 - Minneapolis, United States of America Duration: 2 Jun 2019 → 7 Jun 2019 https://naacl2019.org/ |
Conference
Conference | North American Association for Computational Linguistics 2019 |
---|---|
Abbreviated title | NAACL HLT 2019 |
Country/Territory | United States of America |
City | Minneapolis |
Period | 2/06/19 → 7/06/19 |
Internet address |