Efficient benchmarking of NLP APIs using multi-armed bandits

Gholamreza Haffari, Tuan Tran, Mark Carman

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    2 Citations (Scopus)


    Comparing NLP systems to select the best one for a task of interest, such as named entity recognition, is critical for practitioners and researchers. A rigorous approach involves setting up a hypothesis testing scenario using the performance of the systems on query documents. However, often the hypothesis testing approach needs to send a large number of document queries to the systems, which can be problematic. In this paper, we present an effective alternative based on the multi-armed bandit (MAB).We propose a hierarchical generative model to represent the uncertainty in the performance measures of the competing systems, to be used by Thompson Sampling to solve the resulting MAB. Experimental results on both synthetic and real data show that our approach requires significantly fewer queries compared to the standard benchmarking technique to identify the best system according to Fmeasure.

    Original languageEnglish
    Title of host publication15th Conference of the European Chapter of the Association for Computational Linguistics: Proceedings of Conference, Volume 1: Long Papers
    EditorsPhil Blunsom, Alexander Koller
    Place of PublicationPA USA
    PublisherAssociation for Computational Linguistics (ACL)
    Number of pages9
    ISBN (Electronic)9781510838604
    ISBN (Print)9781945626340
    Publication statusPublished - 2017
    EventEuropean Association of Computational Linguistics Conference 2017 - Valencia, Spain
    Duration: 3 Apr 20177 Apr 2017
    Conference number: 15th


    ConferenceEuropean Association of Computational Linguistics Conference 2017
    Abbreviated titleEACL 2017

    Cite this