1 / 19

Privacy Preserving Learning of Decision Trees

Privacy Preserving Learning of Decision Trees. Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute). Cryptographic methods. perturbation methods. Cryptographic methods vs. perturbation methods. overhead. This work…. inaccuracy. lack of privacy.

tavon
Télécharger la présentation

Privacy Preserving Learning of Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)

  2. Cryptographic methods perturbation methods Cryptographic methods vs. perturbation methods overhead This work… inaccuracy lack of privacy

  3. A story We’re experiencing a lot of fraud lately… Here too.. I can’t find a pattern to recognize fraud in advance.. Neither can I.. • But, what about • Patients’ privacy • Business secrets Maybe we should share information.. Have you heard of “Secure function evaluation” ? This is all “theory”. It can’t be efficient.

  4. Huge Privacy preserving data mining P2 P1 Confidential databaseD1 Confidential databaseD2 Wish to “mine”D1D2 without revealing more info • Examples: • Medical databases protected by law • Competing businesses • Government agencies (privacy, “need to know”)

  5. y x Input: nothing C(x,y) and nothing else Output: One Exp per log “OT”s [NP] Secure Function Evaluation [Yao ‘86] • F(x,y) – A public function. • Represented as a Boolean circuit C(x,y). • Implementation: • Two passes • O(|X|) “oblivious transfers”. O(|C|) communication. • Pretty efficient for small circuits!

  6. Our Contribution • An efficient sub-linear protocol for secure computation of a complex well-known data-mining alg (ID3), for “semi-honest” parties. • A different approach offered by the data-mining community [AS’00]: • Perturb each entry (add random noise). • Analyze accuracy of using perturbed data as input to data mining algorithms. • How much privacy?

  7. The classification problem

  8. ID3: Choose attribute A that minimizes the conditional entropy of the attribute class history [0,4] years [4,9] years > 10 years Age > 30 Claim > $500 No No Yes No Yes Yes Yes No No Classification using Decision Trees

  9. Privacy Preserving ID3 • Core of the problem:Comparing entropies while preserving privacy.(entropy = x logx) • Privacy: for each party, all intermediate values are random. • Efficiency: most computation done independently by parties. • Basic task: compute x log x. x = e.g. # of patients with (age > 30) and (fraud = yes)

  10. Privacy Preserving ID3 • Computing x log x: • x =x1+ x2known to P1 and P2 respectively (independently computed from databases). • Might as well compute x lnx lnx. • First run a protocol to compute random shares, y1+ y2= ln x • ln x is Real. Crypto works over finite fields. Must do numerical analysis.

  11. Cryptographic Tools • Secure Function Evaluation (SFE) [Yao] • Oblivious Polynomial Evaluation [NP] Q( . ) x Input: Q(x) and nothing else nothing Output: Implementation: Two passes, O(degree) (or O( log|F|) ) exponentiations.

  12. Computing random shares oflnx = ln(x1+x2) Use Taylor approximation for lnx • x = x1 +x2 = 2 n (1+) -½< < ½ • lnx = ln(2 n (1+)) = ln 2 n + ln(1+) ln 2 n +  i=1..k(-1) i-1 i / i = ln 2 n + T() • T()is a polynomial of degree k. Error is exponentially small in k. • We only know how to work over finite fields • Work in F, where |F| sufficiently large. • Compute c·lnx, where c compensates for fractions.

  13. ln(x1+x2) Protocol (Cont.) • Step 1 of the protocol – Find n,  • Apply Yao’s protocol to the following small circuit • Input: x1andx2 • Output (random shares): • randoma1 and a2 s.t. a1 + a2 = x-2 n = ·2 n • randomb1 and b2 s.t. b1 + b2 = ln 2 n • Operation: The protocol finds 2 n closest to x1+x2, computes 2 n = x1+x2- 2 n. x =x1 +x2 = 2 n + 2 n

  14. ln(x1+x2) Protocol (Cont) Step 2 of the protocol • Compute random shares of T() (Taylor approx.) • P1 chooses a randomw1 F and defines a polynomial Q(x), s.t. w1+Q(a2) = T() (recall a1 + a2 = ·2 n) • Namely, Q(x) = T( (a1+x)/2 n) – w1. • Run an oblivious poly evaluation in which P2 computes • w2= Q(a2) = T() – w1. • Now the parties have randomw1 and w2 s.t. • w1 + w2 = T()  ln(1+) • (b1 + w1) + (b2+ w2)  ln 2 n + ln(1+) = ln x

  15. Computing x lnx • Tool: Multiply(c1,c2) • Input: c1, c2 • Output: d1, d2s.t. d1+d2 = c1 *c2 • How? OPE of Q(z) = c1*z -d1 • d2 = Q(c2) = c1 *c2 - d1 • Actual task: x lnx • Input: x1 +x2 =x,c1 +c2 = ln x • Output: x lnx = (x1 +x2 )*(c1 +c2) • Run Multiply(x1 ,c2), Multiply (c1 ,x2)

  16. The rest of the work.. • Each party computes a share of the entropy by summing shares of x lnx • A small circuit finds the attribute giving the minimal conditional entropy • The attribute is assigned to the node • The databases are divided according to the value of this attribute

  17. Efficiency • lnx protocol: • secure computation of a small circuit • one oblivious polynomial evaluation • ID3 for a database with: • 1,000,000 transactions • 15 attributes • 10 values per attribute • 4 class values • Communication per node takes seconds (T1) • Computation per node takes minutes (P3)

  18. Issues • Only two participants • “Curious but honest” participants • Approximating ln x gives an approximation of ID3 • The participants learn the decision tree, which reveals some information

  19. Contributions • A cryptographic protocol where the bulk of the operations is done independently. • Data mining • Rigorous model for secure data-mining. • Efficient, secure protocol for ID3. • Cryptography • Sub-linear complexity - secure computation for large data sets. • An efficient protocol for a complex known algorithm. • Secure computation of logarithms(real function - numerical analysis).

More Related