1 / 41

Research Overview

Research Overview. Xintao Wu Aug 25,2014. Outline. Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation Fraud Detection in Social Networks Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies

lewis-cain
Télécharger la présentation

Research Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Overview Xintao Wu Aug 25,2014

  2. Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection in Social Networks • Spectral analysis of graph topology • Detecting Random Link Attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work

  3. Trustworthy Computing • Trustworthy = reliability, security, privacy, usability • Sample research challenges • Understand and capture emergent behaviors/interactions among regular users, fraudsters, and victims • Design secure, survivable, persistent systems when under attack • Enable privacy protection in collecting/analyzing/sharing personal data

  4. Privacy Breach Cases • Nydia Velázquez (1994) • Medical record on her suicide attempt was disclosed • AOL Search Log (2006) • Anonymized release of 650K users’ search histories lasted for less than 24 hours • NetFlix Contest (2009) • $1M contest was cancelled due to privacy lawsuit • 23andMe (2013) • Genetic testing was ordered to discontinue by FDA due to genetic privacy

  5. Acxiom • Privacy • In 2003, the EPIC alleged Acxiom provided consumer information to US Army "to determine how information from public and private records might be analyzed to help defend military bases from attack." • In 2013 Acxiom was among nine companies that the FTC investigated to see how they collect and use consumer data. • Security • In 2003, more than 1.6 billion customer records were stolen during the transmission of information to and from Acxiom's clients.

  6. Privacy Regulation -- Forrester • Most restricted • Restricted • Minimal restrictions • Some restrictions • No legislation or no information • Effectively no restrictions

  7. Privacy Protection Laws • USA • HIPAA for health care • Grann-Leach-Bliley Act of 1999 for financial institutions • COPPA for children online privacy • State regulations, e.g., California State Bill 1386 • Canada • PIPEDA 2000 - Personal Information Protection and Electronic Documents Act • European Union • Directive 94/46/EC - Provides guidelines for member state legislation and forbids sharing data with states that do not protect privacy • Contractual obligations • Individuals should have notice about how their data is used and have opt-out choices

  8. Privacy Preserving Data Mining 69% unique on zip and birth date 87% with zip, birth date and gender Generalization (k-anonymity, l-diversity, t-closeness) Randomization

  9. Social Network Data Data miner • Data owner release

  10. Threat of Re-identification • Attacker attack • Privacy breaches • Identity disclosure • Link disclosure • Attribute disclosure

  11. Privacy Preservation in Social Network Analysis • Input Perturbation • K-anonymity • Generalization • Randomization • Output Perturbation • Background on differential privacy • Differential privacy preserving social network mining

  12. Our Work • Feature preservation randomization • Spectrum preserving randomization (SDM08) • Markov chain based feature preserving randomization (SDM09) • Reconstruction from randomized graph (SDM10) • Link privacy (from the attacker perspective) • Exploiting node similarity feature (PAKDD09 Best Student Paper Runner-up Award) • Exploiting graph space via Markov chain (SDM09)

  13. PSNet(NSF-0831204)

  14. Output Perturbation Data miner • Data owner Query f Query result + noise Cannot be used to derive whether any individual is included in the database

  15. Differential Guarantee [Dwork, TCC06] f count(#cancer) • K f(x) + noise 3 + noise f count(#cancer) • K f(x’) + noise 2 + noise achieving Opt-Out

  16. Our Work • DP-preserving cluster coefficient (ASONAM12) • Divide and conquer • Smooth sensitivity • DP-preserving spectral graph analysis (PAKDD13) • LNPP: based on the Laplace Noise Perturbation • SBMF: based on the Exponential Mechanism and MBF density • Linear-refinement of DP-preserving query answering (PAKDD13 Best Application Paper) • DP-preserving graph generation based on degree correlation (TDP13)

  17. SMASH (NIH R01GM103309)

  18. Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection • Spectral analysis of graph topology • Detecting Random Link Attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work

  19. Cyber Fraud • Cyber crime • cost US economy $400 Billion annually • OSN Fraud and Attack • Sybil attack, spam, viral marketing, fraudulent auction, brand jacking, denial of service, etc. • Fake followers on Twitter (used in viral marketing) worth $360 million annually on the black market.

  20. Topology-based Detection • Fraud Characterization • Individual vs. collusive • Robot vs. money-motivated regular user • Random vs. selective target • Static vs. dynamic • Traditional topology-based detection methods • incur high computational cost • difficult to detect collaborative attacks or subtle anomalies

  21. Random Link Attack [Shirvastava ICDE08] • An abstraction of collaborative attacks including spam, viral marketing, etc. • The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes; • Fake nodes also mimic the real graph structure among themselves to evade detection.

  22. Spectral Graph Analysis based Fraud Detection Examine the spectral space of graph topology. • A network with n nodes and m edges that is undirected, un-weighted, and without considering link/node attribute information • Adjacency Matrix A (symmetric) • Adjacency Eigenspace

  23. Eigenspace Principal Minor

  24. Projecting Node in Spectral Space [SDM09] Spectral coordinate: • k-orthogonal line pattern when nodes u, v from the same community when nodes u, v from different communities

  25. Example Spectral coordinate: Polbook Network

  26. Evaluation on Web spam challenge data [ICDE11] A snapshot of websites in domain .UK (2007) (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns. SPCTRA: based on spectral space GREEDY: based on outer-triangles [Shrivastava, ICDE08] Much faster 36s vs. 26h

  27. Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection • Spectral analysis of graph topology • Detecting random link attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work

  28. Privacy Preserving Data Mining (NSF CAREER) 28

  29. Genetic Privacy (NSF SCH pending) BIBM13 Best Paper Award

  30. oSafari(NSF SaTC)

  31. Manipulation in E-Commerce (NSF III pending) Reviews Ratings Ranks • Bot-committed • Money-motivated Spectral Bipartite Graph Analysis Structured Topic Analysis D-S based Evidence Fusion

  32. Privacy Preserving Database Application testing (NSF 0310974) ER DDL Production db Catalog Data Schema & Domain Filter User R NR S Conflict resolution Disclosure Assessment Rule Analyzer R’ NR’ S’ Schema’ Domain’ Data Generator Mock DB 33

  33. Data Generation for Testing DB Applications (NSF 0915059) How to generate data to cover paths? 34

  34. Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection • Spectral analysis of graph topology • Detecting Random Link Attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work

  35. Big Data Computing • Drowning in data • Volume, Velocity, Variety, and Veracity • 2.5 Exabyte every day • Web data, healthcare, e-commerce, social network • Advancing technology • Cheap storage/processing power • Growth in huge data centers • Data is in the “cloud”- Amazon AWS, Hadoop, Azure • Computing is in the “cloud”

  36. Social Media Customer Analytics Unstructured text (e.g., blog, tweet) Product and review Transaction database Structured profile Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy Velocity, Variety 10GB tweets per day Belk and Lowe’s Chancellor’s special fund Network topology (friendship,followship,interaction) Retweet sequence

  37. Samsung AVC Denial Log Analysis Volume and Velocity:1 million log files per day and each has thousands entries S3, Hive and EMR

  38. Drivers of Data Computing Reliability Security Privacy Usability 6A’s Anytime Anywhere Access to Anything by Anyone Authorized 4V’s Volume Velocity Variety Veracity

  39. Thank You! Questions? Collaborators: Aidong Lu, Xinghua Shi, Jun Li (Oregon), Dejing Dou (Oregon), Tao Xie (UIUC) Doctoral graduates: SongtaoGuo, Ling Guo, Kai Pan, Leting Wu, Xiaowei Ying Doctoral Students: Yue Wang, Yuemeng Li, ZhilinLuo (visiting)

More Related