1 / 65

Lecture 20: Privacy in Online Social Networks

Lecture 20: Privacy in Online Social Networks. Xiaowei Yang. References: On the Leakage of Personally Identifiable Information Via Online Social Networks by Balachander Krishnamurthy and Craig E. Wills

jabari
Télécharger la présentation

Lecture 20: Privacy in Online Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 20: Privacy in Online Social Networks Xiaowei Yang

  2. References: • On the Leakage of Personally Identifiable Information Via Online Social Networks by Balachander Krishnamurthy and Craig E. Wills • Characterizing Privacy in Online Social Networks by Balachander Krishnamurthy and Craig E. Wills

  3. Problem • Online social networks are places for users to share privacy information • Personal identifiable information (PII) • Information that can be used to distinguish or trace an individual’s identity either alone or when combined with other information linkable to an individual • Examples of PII • Photos • Status update • However, this information can be leaked to unintended parties

  4. Today • Measurement studies of the importance of the problem • PII can be leaked to third-party websites that make users browsing history linkable • OSN default privacy settings leak PII

  5. Types of private bits in OSNs

  6. Who can see your private bits

  7. USER PRIVACY CONTROLS • Defaults are dangerous • By default, information in a user’s Facebook profile/content, and comments (as on a user’s “Wall”) are viewable by any other user in the user’s networks • Has it changed? • MySpace uses similar permissive defaults in terms of access to a user’s information—all users have access to all other user’s information.

  8. Do users change their defaults? • A 2005 study found that • only 1.2% of college Facebook users at CMU changed the searchability of their thumbnail profile • 0.06% changed their profile visibility (second row) • 75% of 200 users in the Facebook London regional network have their full profile viewable by other users in the network

  9. Measurement Methodology • MySpace • Generated 5000 random numeric userids in an observed range of valid userids • Retrieved their corresponding user profiles • Bebo • Examined the profiles of users who were members of interest groups within Bebo

  10. Facebook • Join regional networks • Large and Small • Geographic diversity • Linguistic/culture diversity • Used the random network browsing feature of Facebook to crawl users’ profiles • 10 users are displayed • 200 retrievals for each regional network • 1600-1700 users

  11. Results • MySpace • Obtained profile information for 3851 valid userids • 79% (3046) of users retained their default settings • Profile, friends, comments and user content world viewable. • Bebo • 80% of the Bebo users allowed their profile, friends, comments and user content to be viewable.

  12. Facebook

  13. Observations • Users in smaller networks less concerned in making private information available • Higher privacy value in profile information than list of friends • Wall is the most valuable • 79% of those with a viewable profile allowed their Wall to be viewable to anyone in the network for NY • 83% for Seattle • 95% for the Worcester region.

  14. USE OF THIRD-PARTY DOMAINS

  15. Information leakage to 3rd party domains • PII is sent to 3rd party domains via HTTP requests • Same PII may be sent to the same 3rd party domains when users browse other websites •  Online history traceable

  16. HTTP Background • A cookie is a piece of text stored by a web browser • A cookie is sent as an HTTP header by a web server to a web browser • The web browser sends it back unchanged to the server each time it accesses the server • A cookie makes web browsing stateful • http is a request/response stateless protocol

  17. HTTP background (cont.) • An HTTP request contains • the method to be applied to the resource • Request-URI (the uniform resource identifier to the resource) • The protocol version in use • Example of a Request-URI GET /pub/WWW/TheProject.html HTTP/1.1 Host: www.w3.org

  18. HTTP background (cont.) • Referer is a request header field • Specifies to the server the address (URI) of the resource from which the Request-URI was obtained • I.e., who asked for the server URI • Referer allows a server to generate customized contents

  19. PII in OSNs

  20. Sample of Leakage • Friendid is associated with the doubleclick cookie • Other sites the user browses can be linked to the friendid

  21. Leakage of OSN IDs • z.digg.com is a 3rd party advertisement site

  22. Leakage via External Applications

  23. Leakage of pieces of PII

  24. Protection Against PII Leakage • User actions • Providing none in OSNs • Filtering HTTP headers • Referer, Cookie • Disallow cookies • … • Aggregators • Filtering PII • Are they going to do it?

  25. OSNs • Strip PII from HTTP requests • A session specific value for UID • External applications • Similarly, strip PII from HTTP requests

  26. Problem Not Unique to OSNs • Any site you have an account with can do so • Examples • A news site leaks user email addresses to online aggregators • A travel site embeds a user’s first name and default airport in its cookies, and leaks them to any site hiding in its domain

  27. Conclusion • Eric Schmidt “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.” • By clicking the links and browsing online, they know a lot more about you than you thought

  28. Discussion • What can be done to improve online user privacy? • Browser isolation • Next lecture: privacy-preserving online advertisements • Law enforcement?

  29. Lecture 21: Privacy and Online Advertising

  30. References • Challenges in Measuring Online Advertising Systems by SaikatGuha, Bin Cheng, and Paul Francis • Serving Ads from localhost for Performance, Privacy, and Profit by SaikatGuha, AlexeyReznichenko, Kevin Tang, HamedHaddadi, and Paul Francis

  31. Problem • Online advertising funds many web services • E.g., all the free stuff we get from Google • Ad networks gather much user information • How do they use the user information?

  32. Goals • Determining how well ad networks target users

  33. Methodology • Creating two clients representing two different user types • Measuring the different ads each client sees

  34. Challenges • How to compare ads • How to collect a representative snapshot of ads • Quantifying the differences • Avoiding measurement artifacts

  35. Comparing Ads is challenging • Ads don’t have unique IDs • A & B are semantically the same, but with different text • A & C are different, but with same display URLs

  36. How to define two ads are the same? • Easy but illegal approach: comparing destination URLs • FP: flagged as equal but not • FN: equal but not flagged • Display URL has the lowest FNs  Use display URL to define ads equality

  37. Taking a Snapshot • More ads can be displayed on any single page • How to determine all Ads that may be fed to a user? • Reload the page multiple times • But too many reloads may lead to ads churn: old ads expire, new ads show up

  38. Determining the # of reloads • Reloads every 5 seconds • Repeated for 200 queries • Curve becomes linear > 10 reloads • Ads churns • Use 10 reloads as the threshold

  39. Quantifying Change • Metrics • Jaccard index: • Extended Jaccard index (cosine similarity)

  40. Comparing Effectiveness • Views: # of page reloads containing the ad • Value: # of page reloads scaled by the position of the ad • Overlap: Jaccard index

  41. Comparing Effectiveness

  42. The winner is • Weight: log(views) or log(value)

  43. Avoiding artifacts • Different system parameters may lead to different ads view • Browsers used different DNS servers • Browsers receive different cookies • HTTP proxy

  44. Analysis • Configure two or more instances to differ by one parameter • Comparing results for • Search Ads • Website Ads • Online Social Network Ads

  45. Search Ads • A, B: control w/o cookies • C, D: w/ cookies enabled. Seeded w/ different personae • Google 730 random product-related queries for 5 days • No obvious behavioral targeting in search ads. Why? • Keyword based ads bidding • Location targeting not studied

  46. Websites Ads • Measure 15 websites that show Google ads • A, B: control in NY • C: SF; D: Germany • Location affects web ads

  47. Website Ads • A, B: control • C: browse 3 out of 15 websites • D and E: browse random websites and Google search random websites • Google does not use browsing behavior to pick ads

  48. Online social network ads • Set up three or more Facebook profiles • A, B: control and identical • C: differs from A by one profile parameter

More Related