Abstract
Crowdsourcing services provide a fast and cheap way to obtain substantial labeled data by employing crowd workers on the Internet. In crowdsourcing learning, two-stage methods have been widely used, which first infer the integrated label for each instance and then build a learning model using instances with their integrated labels. However, existing two-stage methods mainly focus on how to infer more accurate integrated labels, after that, most of them directly regard the integrated labels as class labels to build a learning model, which loses the detailed worker labeling information in multiple noisy labels and thus results in sub-optimal model accuracy. To solve this problem, in this study, we take the multiple noisy labels of each instance as its attribute value vector to construct another view in addition to the original attribute view, and propose a novel two-stage method called dual-view learning from crowds (DVLFC). In DVLFC, we first pick out workers with sufficient number of labels and augment the multiple noisy label set for each instance, then we build a supervised learning model in each view and at last we fuse their class-membership probabilities to get the final classification result. Extensive experiments on both real-world and artificial crowdsourced datasets prove the effectiveness of DVLFC.
| Original language | English |
|---|---|
| Article number | 61 |
| Number of pages | 21 |
| Journal | ACM Transactions on Knowledge Discovery from Data |
| Volume | 19 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - 20 Feb 2025 |
Keywords
- crowdsourcing
- dual-view learning
- label integration
- model accuracy