DAG

a general model for privacy-preserving data mining

Sin G. Teo, Jianneng Cao, Vincent C.S. Lee

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. It has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM),to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, -, , /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones like Naıve Bayes classifier. It is also extendable – new secure operators can be defined to expand the functions that the model supports. For case study,we have applied our DAG model to two data mining tasks: kernel regression and Naıve Bayes. Experimental results show that DAG generates outputs that are almost the same as those by non-private setting, where multiple parties simply disclose their data. The experimental results also show that our DAG model runs in acceptable time, e.g., in kernel regression, when training data size is 683,093, one prediction in non-private setting takes 5.93 sec, and that by our DAG model takes 12.38 sec.

Original languageEnglish
Pages (from-to)40-53
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume32
Issue number1
DOIs
Publication statusPublished - 1 Jan 2020

Keywords

  • Computational modeling
  • Cryptography
  • Data mining
  • Data models
  • Protocols
  • Task analysis

Cite this

@article{239c625e37494920b99e62ffc5855051,
title = "DAG: a general model for privacy-preserving data mining",
abstract = "Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. It has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM),to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, -, , /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones like Naıve Bayes classifier. It is also extendable – new secure operators can be defined to expand the functions that the model supports. For case study,we have applied our DAG model to two data mining tasks: kernel regression and Naıve Bayes. Experimental results show that DAG generates outputs that are almost the same as those by non-private setting, where multiple parties simply disclose their data. The experimental results also show that our DAG model runs in acceptable time, e.g., in kernel regression, when training data size is 683,093, one prediction in non-private setting takes 5.93 sec, and that by our DAG model takes 12.38 sec.",
keywords = "Computational modeling, Cryptography, Data mining, Data models, Protocols, Task analysis",
author = "Teo, {Sin G.} and Jianneng Cao and {C.S. Lee}, Vincent",
year = "2020",
month = "1",
day = "1",
doi = "10.1109/TKDE.2018.2880743",
language = "English",
volume = "32",
pages = "40--53",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "1",

}

DAG : a general model for privacy-preserving data mining. / Teo, Sin G.; Cao, Jianneng; C.S. Lee, Vincent .

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 32, No. 1, 01.01.2020, p. 40-53.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - DAG

T2 - a general model for privacy-preserving data mining

AU - Teo, Sin G.

AU - Cao, Jianneng

AU - C.S. Lee, Vincent

PY - 2020/1/1

Y1 - 2020/1/1

N2 - Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. It has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM),to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, -, , /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones like Naıve Bayes classifier. It is also extendable – new secure operators can be defined to expand the functions that the model supports. For case study,we have applied our DAG model to two data mining tasks: kernel regression and Naıve Bayes. Experimental results show that DAG generates outputs that are almost the same as those by non-private setting, where multiple parties simply disclose their data. The experimental results also show that our DAG model runs in acceptable time, e.g., in kernel regression, when training data size is 683,093, one prediction in non-private setting takes 5.93 sec, and that by our DAG model takes 12.38 sec.

AB - Secure multi-party computation (SMC) allows parties to jointly compute a function over their inputs, while keeping every input confidential. It has been extensively applied in tasks with privacy requirements, such as privacy-preserving data mining (PPDM),to learn task output and at the same time protect input data privacy. However, existing SMC-based solutions are ad-hoc – they are proposed for specific applications, and thus cannot be applied to other applications directly. To address this issue, we propose a privacy model DAG (Directed Acyclic Graph) that consists of a set of fundamental secure operators (e.g., +, -, , /, and power). Our model is general – its operators, if pipelined together, can implement various functions, even complicated ones like Naıve Bayes classifier. It is also extendable – new secure operators can be defined to expand the functions that the model supports. For case study,we have applied our DAG model to two data mining tasks: kernel regression and Naıve Bayes. Experimental results show that DAG generates outputs that are almost the same as those by non-private setting, where multiple parties simply disclose their data. The experimental results also show that our DAG model runs in acceptable time, e.g., in kernel regression, when training data size is 683,093, one prediction in non-private setting takes 5.93 sec, and that by our DAG model takes 12.38 sec.

KW - Computational modeling

KW - Cryptography

KW - Data mining

KW - Data models

KW - Protocols

KW - Task analysis

UR - http://www.scopus.com/inward/record.url?scp=85056328662&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2880743

DO - 10.1109/TKDE.2018.2880743

M3 - Article

VL - 32

SP - 40

EP - 53

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 1

ER -