1 / 18

Incentive compatible Assured Data Sharing & Mining

Murat Kantarcioglu. Incentive compatible Assured Data Sharing & Mining. Incentives and Trust in Assured Information Sharing. Combining intelligence through a loose alliance Bridges gaps due to sovereign boundaries Maximizes yield of resources

troy-parks
Télécharger la présentation

Incentive compatible Assured Data Sharing & Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Murat Kantarcioglu Incentive compatible Assured Data Sharing & Mining

  2. Incentives and Trust in Assured Information Sharing Combining intelligence through a loose alliance Bridges gaps due to sovereign boundaries Maximizes yield of resources Discovery of new information through correlation, analysis of the ‘big picture’ Information exchanged privately between two participants Drawbacks to sharing Misinformation Freeloading Goal: Create means of encouraging desirable behavior within an environment which lacks or cannot support a central governing agent

  3. Possible Scenarios • You may verify the shared data, and issue fines if the data is wrong • This is easy • You may verify the share data but cannot issue fines • Little bit harder • You may only verify some aggregate result • Hardest

  4. Game Matrix Value of information Trust value Agent type Minimal verification probability Cost of Verification

  5. Behaviors Analyzed in Data Sharing Simulations

  6. Simulation Results We set δmin = 3, δmax = 7, CV = 2 Lie threshold is set 6.9 Honest behavior wins %97 percent of the time if all behaviors exist. Experiments show without LivingAgent behavior, Honest behavior cannot flourish. Please see the following paper for mode details: “Incentive and Trust Issues in Assured Information Sharing”Ryan Layfield, Murat Kantarcioglu, and Bhavani Thuraisingham International Conference on Collaborative Computing 2008

  7. Verifying Final Result: Our Model Players P1....Pn: Each has some data (x1...xn), and Goal: compute a data mining function, D(x1,...,xn) that maximizes the sum of the participants valuation function. Player Pt: Mediator between parties, computes the function securely, and has test data xt Players value privacy, correctness, exclusivity Problem: How do we ensure that players share data truthfully?

  8. Assumption The best model that maximizes sum of the valuation function is the model built by using the submitted input data. Formally: Given submitted valuation functions and submitted data D(x) = argmaxmM (S{k}vk(m) ) for any set of players

  9. Mechanism Reservation utility normalized to 0 ui(m) = vi(m) – pi(vi,v-i) [u = utility] [v = valuation] [p = payment] pi(vi,v-i) = argmaxm’M (S{k!=i}(vk(m’)) – S{k!=i}(vk(m)) vi(m) = max{0,acc(m)-acc(D(xi)} – c(D) c is the cost of computation, acc is accuracy

  10. Mechanism We compute pi using the independent test set held by Pt Intuitively, mechanism rewards players based on their contribution to the overall model This is a VCG mechanism, proved incentive compatible, under our assumption

  11. Experiments Does this assumption hold for normal data? Methodology 4 data sets from UCI Repository 3-party vertical partitioning, naïve-Bayes classifiers Determine accuracy and payouts Payouts estimated by acc(classifier) – acc(classifier without player i’s data) – constant cost Once with all players truthful Once for each player and for each amount of perturbation (1%, 2%, 4%, 8%, 16%, 32%, 64%, 100%) 50 runs on each

  12. Census-Income (Adult)

  13. Census-Income (Adult)

  14. Census-Income (Adult)

  15. Census-Income (Adult)

  16. Census-Income (Adult)

  17. Breast-Cancer-Wisconsin

  18. Conclusions • Does the assumption hold? • Not always, but it is very close, and would work as a practical assumption • If better model is found through lying, does this hurt or help? • Consideration: change the goal; not to prevent lying but to build the most accurate classifier • Finding the “right” lie may take too much computation for profitability

More Related