Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy

Haipei Sun, Boxiang Dong, Hui Wendy Wang, Ting Yu, Zhan Qin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Crowdsourcing is a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. Since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting user privacy under local differential privacy (LDP), where individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to ensure high accuracy of the inferred true answers (i.e., truth) from the perturbed data. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of existing approaches (e.g., Laplace perturbation and randomized response) may incur large errors in truth inference on sparse data. Thus we design a new matrix factorization (MF) algorithm under LDP that addresses the trade-off between privacy and utility (i.e., accuracy of truth inference). We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages488-497
Number of pages10
ISBN (Electronic)9781538650356
DOIs
StatePublished - 22 Jan 2019
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: 10 Dec 201813 Dec 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
CountryUnited States
CitySeattle
Period10/12/1813/12/18

Fingerprint

Factorization
Experiments

Cite this

Sun, H., Dong, B., Wang, H. W., Yu, T., & Qin, Z. (2019). Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy. In Y. Song, B. Liu, K. Lee, N. Abe, C. Pu, M. Qiao, N. Ahmed, D. Kossmann, J. Saltz, J. Tang, J. He, H. Liu, ... X. Hu (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 488-497). [8622635] (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8622635
Sun, Haipei ; Dong, Boxiang ; Wang, Hui Wendy ; Yu, Ting ; Qin, Zhan. / Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy. Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. editor / Yang Song ; Bing Liu ; Kisung Lee ; Naoki Abe ; Calton Pu ; Mu Qiao ; Nesreen Ahmed ; Donald Kossmann ; Jeffrey Saltz ; Jiliang Tang ; Jingrui He ; Huan Liu ; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 488-497 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).
@inproceedings{f7a7efb36ae449b2a365bcd97572a4a4,
title = "Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy",
abstract = "Crowdsourcing is a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. Since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting user privacy under local differential privacy (LDP), where individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to ensure high accuracy of the inferred true answers (i.e., truth) from the perturbed data. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of existing approaches (e.g., Laplace perturbation and randomized response) may incur large errors in truth inference on sparse data. Thus we design a new matrix factorization (MF) algorithm under LDP that addresses the trade-off between privacy and utility (i.e., accuracy of truth inference). We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.",
author = "Haipei Sun and Boxiang Dong and Wang, {Hui Wendy} and Ting Yu and Zhan Qin",
year = "2019",
month = "1",
day = "22",
doi = "10.1109/BigData.2018.8622635",
language = "English",
series = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "488--497",
editor = "Yang Song and Bing Liu and Kisung Lee and Naoki Abe and Calton Pu and Mu Qiao and Nesreen Ahmed and Donald Kossmann and Jeffrey Saltz and Jiliang Tang and Jingrui He and Huan Liu and Xiaohua Hu",
booktitle = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",

}

Sun, H, Dong, B, Wang, HW, Yu, T & Qin, Z 2019, Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy. in Y Song, B Liu, K Lee, N Abe, C Pu, M Qiao, N Ahmed, D Kossmann, J Saltz, J Tang, J He, H Liu & X Hu (eds), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018., 8622635, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, Institute of Electrical and Electronics Engineers Inc., pp. 488-497, 2018 IEEE International Conference on Big Data, Big Data 2018, Seattle, United States, 10/12/18. https://doi.org/10.1109/BigData.2018.8622635

Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy. / Sun, Haipei; Dong, Boxiang; Wang, Hui Wendy; Yu, Ting; Qin, Zhan.

Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. ed. / Yang Song; Bing Liu; Kisung Lee; Naoki Abe; Calton Pu; Mu Qiao; Nesreen Ahmed; Donald Kossmann; Jeffrey Saltz; Jiliang Tang; Jingrui He; Huan Liu; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. p. 488-497 8622635 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy

AU - Sun, Haipei

AU - Dong, Boxiang

AU - Wang, Hui Wendy

AU - Yu, Ting

AU - Qin, Zhan

PY - 2019/1/22

Y1 - 2019/1/22

N2 - Crowdsourcing is a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. Since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting user privacy under local differential privacy (LDP), where individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to ensure high accuracy of the inferred true answers (i.e., truth) from the perturbed data. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of existing approaches (e.g., Laplace perturbation and randomized response) may incur large errors in truth inference on sparse data. Thus we design a new matrix factorization (MF) algorithm under LDP that addresses the trade-off between privacy and utility (i.e., accuracy of truth inference). We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.

AB - Crowdsourcing is a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. Since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting user privacy under local differential privacy (LDP), where individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to ensure high accuracy of the inferred true answers (i.e., truth) from the perturbed data. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of existing approaches (e.g., Laplace perturbation and randomized response) may incur large errors in truth inference on sparse data. Thus we design a new matrix factorization (MF) algorithm under LDP that addresses the trade-off between privacy and utility (i.e., accuracy of truth inference). We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.

UR - http://www.scopus.com/inward/record.url?scp=85062613704&partnerID=8YFLogxK

U2 - 10.1109/BigData.2018.8622635

DO - 10.1109/BigData.2018.8622635

M3 - Conference contribution

AN - SCOPUS:85062613704

T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

SP - 488

EP - 497

BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

A2 - Song, Yang

A2 - Liu, Bing

A2 - Lee, Kisung

A2 - Abe, Naoki

A2 - Pu, Calton

A2 - Qiao, Mu

A2 - Ahmed, Nesreen

A2 - Kossmann, Donald

A2 - Saltz, Jeffrey

A2 - Tang, Jiliang

A2 - He, Jingrui

A2 - Liu, Huan

A2 - Hu, Xiaohua

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Sun H, Dong B, Wang HW, Yu T, Qin Z. Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy. In Song Y, Liu B, Lee K, Abe N, Pu C, Qiao M, Ahmed N, Kossmann D, Saltz J, Tang J, He J, Liu H, Hu X, editors, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 488-497. 8622635. (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). https://doi.org/10.1109/BigData.2018.8622635