Abstract
Word embeddings — distributed word representations that can be learned from unlabelled data — have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of four popular word embedding methods in the context of four sequence labelling tasks: part-of-speech tagging, syntactic chunking, named entity recognition, and multiword expression identification. A particular focus of the paper is analysing the effects of task-based updating of word representations. We show that when using word embeddings as features, as few as several hundred training instances are sufficient to achieve competitive results, and that word embeddings lead to improvements over out-of-vocabulary words and also out of domain. Perhaps more surprisingly, our results indicate there is little difference between the different word embedding methods, and that simple Brown clusters are often competitive with word embeddings across all tasks we consider.
Original language | English |
---|---|
Title of host publication | CoNLL 2015 - The 19th Conference on Computational Natural Language Learning - Proceedings of the Conference |
Editors | Afra Alishahi, Alessandro Moschitti |
Place of Publication | Taberg Sweden |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 83-93 |
Number of pages | 11 |
ISBN (Electronic) | 9781941643778 |
DOIs | |
Publication status | Published - 2015 |
Externally published | Yes |
Event | Conference on Natural Language Learning 2015 - Beijing, China Duration: 30 Jul 2015 → 31 Jul 2015 Conference number: 19th https://www.conll.org/2015 https://www.aclweb.org/anthology/volumes/K15-1/ (Proceedings) |
Conference
Conference | Conference on Natural Language Learning 2015 |
---|---|
Abbreviated title | CoNLL 2015 |
Country/Territory | China |
City | Beijing |
Period | 30/07/15 → 31/07/15 |
Internet address |