Diversity enhanced active learning with strictly proper scoring rules

David Tan, Lan Du, Wray Buntine

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log probability or negative mean square error, which we call Bayesian Estimate of Mean Proper Scores (BEMPS). We also prove convergence results borrowing techniques used with MOCU. In order to allow better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm, which encourages diversity in the vector of expected changes in scores for unlabelled data. To allow high performance text classifiers, we combine ensembling and dynamic validation set construction on pretrained language models. Extensive experimental evaluation then explores how these different acquisition functions perform. The results show that the use of mean square error and log probability with BEMPS yields robust acquisition functions, which consistently outperform the others tested.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 34 (NeurIPS 2021)
EditorsM Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, J. Wortman Vaughan
Place of PublicationSan Diego CA USA
PublisherNeural Information Processing Systems (NIPS)
Number of pages13
ISBN (Electronic)9781713845393
Publication statusPublished - 2021
EventAdvances in Neural Information Processing Systems 2021 - Online, United States of America
Duration: 7 Dec 202110 Dec 2021
Conference number: 35th
https://papers.nips.cc/paper/2021 (Proceedings)
https://nips.cc/Conferences/2021 (Website)

Conference

ConferenceAdvances in Neural Information Processing Systems 2021
Abbreviated titleNeurIPS 2021
Country/TerritoryUnited States of America
CityOnline
Period7/12/2110/12/21
Internet address

Cite this