Leveraging label category relationships in multi-class crowdsourcing

Yuan Jin, Lan Du, Ye Zhu, Mark Carman

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Current quality control methods for crowdsourcing largely account for variations in worker responses to items by interactions between item difficulty and worker expertise. Few have taken into account the semantic relationships that can exist between the response label categories. When the number of the label categories is large, these relationships are naturally indicative of how crowd-workers respond to items, with expert workers tending to respond with more semantically related categories to the categories of true labels, and with difficult items tending to see greater spread in the responded labels. Based on these observations, we propose a new statistical model which contains a latent real-valued matrix for capturing the relatedness of response categories alongside variables for worker expertise, item difficulty and item true labels. The model can be easily extended to incorporate prior knowledge about the semantic relationships between response labels in the form of a hierarchy over them. Experiments show that compared with numerous state-of-the-art baselines, our model (both with and without the prior knowledge) yields superior true label prediction performance on four new crowdsourcing datasets featuring large sets of label categories.
LanguageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication22nd Pacific-Asia Conference, PAKDD 2018 Melbourne, VIC, Australia, June 3–6, 2018 Proceedings, Part II
EditorsDinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi
Place of PublicationCham Switzerland
PublisherSpringer
Pages128-140
Number of pages13
ISBN (Electronic)9783319930374
ISBN (Print)9783319930367
DOIs
Publication statusPublished - 2018
EventPacific-Asia Conference on Knowledge Discovery and Data Mining 2018 - Grand Hyatt, Melbourne, Australia
Duration: 3 Jun 20186 Jun 2018
Conference number: 22nd
http://prada-research.net/pakdd18/
http://prada-research.net/pakdd18/
http://pakdd2018.medmeeting.org/Content/92892

Publication series

NameLecture Notes in Artificial Intelligence
PublisherSpringer
Volume10938
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 2018
Abbreviated titlePAKDD 2018
CountryAustralia
CityMelbourne
Period3/06/186/06/18
Internet address

Cite this

Jin, Y., Du, L., Zhu, Y., & Carman, M. (2018). Leveraging label category relationships in multi-class crowdsourcing. In D. Phung, V. S. Tseng, G. I. Webb, B. Ho, M. Ganji, & L. Rashidi (Eds.), Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018 Melbourne, VIC, Australia, June 3–6, 2018 Proceedings, Part II (pp. 128-140). (Lecture Notes in Artificial Intelligence; Vol. 10938). Cham Switzerland: Springer. https://doi.org/10.1007/978-3-319-93037-4_11
Jin, Yuan ; Du, Lan ; Zhu, Ye ; Carman, Mark. / Leveraging label category relationships in multi-class crowdsourcing. Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018 Melbourne, VIC, Australia, June 3–6, 2018 Proceedings, Part II. editor / Dinh Phung ; Vincent S. Tseng ; Geoffrey I. Webb ; Bao Ho ; Mohadeseh Ganji ; Lida Rashidi. Cham Switzerland : Springer, 2018. pp. 128-140 (Lecture Notes in Artificial Intelligence).
@inproceedings{aad29bf98ee94190b1c0d0fd27678b11,
title = "Leveraging label category relationships in multi-class crowdsourcing",
abstract = "Current quality control methods for crowdsourcing largely account for variations in worker responses to items by interactions between item difficulty and worker expertise. Few have taken into account the semantic relationships that can exist between the response label categories. When the number of the label categories is large, these relationships are naturally indicative of how crowd-workers respond to items, with expert workers tending to respond with more semantically related categories to the categories of true labels, and with difficult items tending to see greater spread in the responded labels. Based on these observations, we propose a new statistical model which contains a latent real-valued matrix for capturing the relatedness of response categories alongside variables for worker expertise, item difficulty and item true labels. The model can be easily extended to incorporate prior knowledge about the semantic relationships between response labels in the form of a hierarchy over them. Experiments show that compared with numerous state-of-the-art baselines, our model (both with and without the prior knowledge) yields superior true label prediction performance on four new crowdsourcing datasets featuring large sets of label categories.",
author = "Yuan Jin and Lan Du and Ye Zhu and Mark Carman",
year = "2018",
doi = "10.1007/978-3-319-93037-4_11",
language = "English",
isbn = "9783319930367",
series = "Lecture Notes in Artificial Intelligence",
publisher = "Springer",
pages = "128--140",
editor = "Dinh Phung and Tseng, {Vincent S.} and Webb, {Geoffrey I.} and Bao Ho and Mohadeseh Ganji and Lida Rashidi",
booktitle = "Advances in Knowledge Discovery and Data Mining",

}

Jin, Y, Du, L, Zhu, Y & Carman, M 2018, Leveraging label category relationships in multi-class crowdsourcing. in D Phung, VS Tseng, GI Webb, B Ho, M Ganji & L Rashidi (eds), Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018 Melbourne, VIC, Australia, June 3–6, 2018 Proceedings, Part II. Lecture Notes in Artificial Intelligence, vol. 10938, Springer, Cham Switzerland, pp. 128-140, Pacific-Asia Conference on Knowledge Discovery and Data Mining 2018, Melbourne, Australia, 3/06/18. https://doi.org/10.1007/978-3-319-93037-4_11

Leveraging label category relationships in multi-class crowdsourcing. / Jin, Yuan; Du, Lan; Zhu, Ye; Carman, Mark.

Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018 Melbourne, VIC, Australia, June 3–6, 2018 Proceedings, Part II. ed. / Dinh Phung; Vincent S. Tseng; Geoffrey I. Webb; Bao Ho; Mohadeseh Ganji; Lida Rashidi. Cham Switzerland : Springer, 2018. p. 128-140 (Lecture Notes in Artificial Intelligence; Vol. 10938).

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

TY - GEN

T1 - Leveraging label category relationships in multi-class crowdsourcing

AU - Jin, Yuan

AU - Du, Lan

AU - Zhu, Ye

AU - Carman, Mark

PY - 2018

Y1 - 2018

N2 - Current quality control methods for crowdsourcing largely account for variations in worker responses to items by interactions between item difficulty and worker expertise. Few have taken into account the semantic relationships that can exist between the response label categories. When the number of the label categories is large, these relationships are naturally indicative of how crowd-workers respond to items, with expert workers tending to respond with more semantically related categories to the categories of true labels, and with difficult items tending to see greater spread in the responded labels. Based on these observations, we propose a new statistical model which contains a latent real-valued matrix for capturing the relatedness of response categories alongside variables for worker expertise, item difficulty and item true labels. The model can be easily extended to incorporate prior knowledge about the semantic relationships between response labels in the form of a hierarchy over them. Experiments show that compared with numerous state-of-the-art baselines, our model (both with and without the prior knowledge) yields superior true label prediction performance on four new crowdsourcing datasets featuring large sets of label categories.

AB - Current quality control methods for crowdsourcing largely account for variations in worker responses to items by interactions between item difficulty and worker expertise. Few have taken into account the semantic relationships that can exist between the response label categories. When the number of the label categories is large, these relationships are naturally indicative of how crowd-workers respond to items, with expert workers tending to respond with more semantically related categories to the categories of true labels, and with difficult items tending to see greater spread in the responded labels. Based on these observations, we propose a new statistical model which contains a latent real-valued matrix for capturing the relatedness of response categories alongside variables for worker expertise, item difficulty and item true labels. The model can be easily extended to incorporate prior knowledge about the semantic relationships between response labels in the form of a hierarchy over them. Experiments show that compared with numerous state-of-the-art baselines, our model (both with and without the prior knowledge) yields superior true label prediction performance on four new crowdsourcing datasets featuring large sets of label categories.

U2 - 10.1007/978-3-319-93037-4_11

DO - 10.1007/978-3-319-93037-4_11

M3 - Conference Paper

SN - 9783319930367

T3 - Lecture Notes in Artificial Intelligence

SP - 128

EP - 140

BT - Advances in Knowledge Discovery and Data Mining

A2 - Phung, Dinh

A2 - Tseng, Vincent S.

A2 - Webb, Geoffrey I.

A2 - Ho, Bao

A2 - Ganji, Mohadeseh

A2 - Rashidi, Lida

PB - Springer

CY - Cham Switzerland

ER -

Jin Y, Du L, Zhu Y, Carman M. Leveraging label category relationships in multi-class crowdsourcing. In Phung D, Tseng VS, Webb GI, Ho B, Ganji M, Rashidi L, editors, Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018 Melbourne, VIC, Australia, June 3–6, 2018 Proceedings, Part II. Cham Switzerland: Springer. 2018. p. 128-140. (Lecture Notes in Artificial Intelligence). https://doi.org/10.1007/978-3-319-93037-4_11