De-anonymizing Social Networks Arvind Narayanan and VitalyShmatikov The University of Texas at Austin -by NafiaMalik
Motivation • OSN are sharing sensitive information • User willingness to share information and disclosure to unintended parties are not connected. • Anonymity can hide personal identification information, but the person can still be identified. Hence Anonymity!= privacy.
Topics covered • Contributions • How data is released? • Related works and a sybil attack for de-anonymization. • Models and definitions. • The De-anonymization algorithm • Experimental Results. • Conclusion.
Contributions • ‘Feasibility of large-scale, passive de-anonymization of real world social network’ • Survey the current state of data sharing. • Formal definition of privacy in social network. • Develop generic re-identification algorithm for anonymized social network. • Demonstration of algorithm on real world social networks.
How data is released? • Academic and government data mining • Sometimes anonymized. • Advertising • rely on anonymity to breach privacy. • Third party applications. • not anonymized. • Aggregation • Multiple OSN. • Other data-release scenarios. • P2P file share on OSN • Large scale facial recognition on photoes.
Related works • Privacy properties • Presence of an edge • Significance of an edge attribute • Attributes attached to nodes. • Mare de-anonymization reveals user observability to network. • De-anonymization attacks • Passive attacks • Coalition/ pack or users compromise their OSN friend.(small scale) • Defenses • Restrictive adversary and small scale network • Scrambling/unscrambling user provided data. • Anonymity as a tool for privacy.(Problem with k-anonymity)
A Sybil attack for de-anonymization. • Active attacks • difficult in large scale • Staging is expensive and impossible • No control over incoming links. • Identifying is easy • Difficult to create dummy nodes(e.g. Facebook checks for uniqueness of email IDs ) • Legitimate users do not linking back to sybil nodes. • Sub graph with one directional nodes are easy to mark out. • Methods for spammer detection by OSN allows unidirectional edges. • Sybil nodes does not help in attack.
Models and definitions. • Social network • A directed graph • Attributes for each node • Attributes for each edge
Models and definitions. • Data Release (in bulk / by network crawling) • Advertisers • Application developers • Researchers.
Models and definitions. • Threat model • Attack scenario • Modeling of the attacker • Auxiliary information • Breaching privacy • How good is the attack?
The De-anonymization algorithm • Seed identification • Inputs: • target graph • k seeds node in auxiliary graph • k node degree value • KC2 pair of common neighbor counts • Outputs: seed mapping of nodes to clique/failure • Propagation • Inputs: • Graph G1= (V1,E1) • Graph G2= (V2,E2) • Partial seed mapping between two graphs • Outputs: mapping μ: V1→V2
Experiment • Twitter crawled in 2007 • Flicker crawled in late 2007/early 2008 • Live Journal from Mislove, Gummadi, Druschel and Bhattacharjee from ‘Measurment and analysis of online social networks’
Conclusion • A third of their users name was reveled. • Only with 12% errors.