Crowdsourcing has been widely established as a means to enable human computation at large-scale, in particular for tasks that require manual labelling of large sets of data items. Answers obtained from heterogeneous crowd workers are aggregated to obtain a robust result. However, existing methods for answer aggregation are designed for discrete tasks, where answers are given as a single label per item. In this paper, we consider partial-agreement tasks that are common in many applications such as image tagging and document annotation, where items are assigned sets of labels. Common approaches for the aggregation of partial-agreement answers either (i) reduce the problemto several instances of an aggregation problem for discrete tasks or (ii) consider each label independently. Going beyond the state-of-the-art, we propose a novel Bayesian nonparametric model to aggregate the partial-agreement answers in a generic way. This model enables us to compute the consensus of partially-sound and partially-complete worker answers, while taking into account mutual relationships in labels and different answer sets. We also show how this model is instantiated for incremental learning, incorporating new answers from crowd workers as they arrive. An evaluation of our method using real-world datasets reveals that it consistently outperforms the state-of-the-art in terms of precision, recall, and robustness against faulty workers and data sparsity.
|Number of pages||14|
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|Publication status||Published - Jan 2018|
- Answer aggregation
- Bayesian models
- Nonparametric models