Effective unsupervised domain adaptation with adversarially trained language models

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

12 Citations (Scopus)

Abstract

Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of randomly masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over subsets of tokens, which we tackle efficiently through relaxation to a variational lower-bound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.

Original languageEnglish
Title of host publicationEMNLP 2020, 2020 Conference on Empirical Methods in Natural Language Processing
Subtitle of host publicationProceedings of the Conference
EditorsTrevor Cohn, Yulan He, Yang Liu
Place of PublicationStroudsburg PA USA
PublisherAssociation for Computational Linguistics (ACL)
Pages6163-6173
Number of pages11
ISBN (Electronic)9781952148606
DOIs
Publication statusPublished - 2020
EventEmpirical Methods in Natural Language Processing 2020 - Virtual, Punta Cana, Dominican Republic
Duration: 16 Nov 202020 Nov 2020
https://2020.emnlp.org/ (Website)
http://www.aclweb.org/anthology/volumes/2020.emnlp-main/ (Proceedings)
https://aclanthology.org/volumes/2020.findings-emnlp/

Conference

ConferenceEmpirical Methods in Natural Language Processing 2020
Abbreviated titleEMNLP 2020
Country/TerritoryDominican Republic
CityPunta Cana
Period16/11/2020/11/20
Internet address

Cite this