Exploring Linkability of User Reviews

Exploring Linkabilityof User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Roadmap Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion

Motivation Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010

Example category Rating

Motivation Rising awareness of privacy

Motivation How is it applied? Traceability/Linkability Linkability of Ad hoc Reviews Linkablility of Several Accounts

Goal Assess the linkability in user reviews

Data Set • 1 Million Reviews • 2000 Users • more than 300 reviews

Problem Settings

IR: Identified Record AR: Anonymous Record IR AR Problem Formulation IR AR IR AR AR IR

TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60 Anonymous Record (AR) Problem Settings Matching Model Identified Records (IR’s)

Methodologies (1) Naïve Bayesian Model Decreasing Sorted List of IRs (2) Kullback-LeiblerDivergence (KLD) Increasing Sorted List of IRs Maximum-Likelihood Estimation

Tokens • Unigram: • “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y” • 26 values • Digram • “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy” • 676 values • Rating • 5 values • Category • 28 values

NB -Unigram Unigram Results Linkability Ratio Size 60, LR 83%/ Top-1 LR 96% Top-10 Anonymous Record Size

Digram Results NB -Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1 Linkability Ratio Anonymous Record Size

Improvement (1): Combining Lexical and non-Lexical ones NB Model Gain, up to 20% Linkability Ratio Anonymous Record Size Size 30, 60 % To 80% Size 60, 83 % To 96%

What about Restricting Identified Record (IR) Size? NB Model KLD Model Linkability Ratio Linkability Ratio Anonymous Record Size Anonymous Record Size Performed better for smaller IR Affected by IR size Size 20 or less, improved

Improvement (2): Matching All IR’s At Once ✔ v4 v2 v3 v1 ✖ ✔ v7 v5 v6 v8 ✖ ✖ ✔ v9 v10 v12 v11 ✖ ✖ ✖ ✔ v15 v14 v13 v16

Matching All Results Restricted IR Full IR Linkability Ratio Linkability Ratio Anonymous Record Size Anonymous Record Size Gain, up to 16% Gain, up to 23% Size 30, From 74% To 90% Size 20, From 35% To 55%

Improvement (3): For Small IR Size Changing it to: 0.5 + Review Length Gain up to 5% Size 10, 89% To 92% Linkability Ratio Size 7, 79% To 84% Anonymous Record Size

Discussion • Unigram and Scalability • 26 VS 676 • 59 VS 676 • Less than 10% • Prolific Users • On the long run, will be prolific • Anonymous Record Size • A set of 60 reviews, less than 20% of minimum contribution • Detecting Spam Reviews

Future Work • Improving more for Small AR’s • Other Probabilistic Models • Using Stylometry • Review Anonymization • Exploring Linkability in other Preference Databases

Conclusion • Extensive Study to Assess Linkability of User Reviews • For large set of users • Using very simple features • Users are very exposed even with simple features and large number of authors Takeaway Point: Reviews can be accurately de-anonymized using alphabetical letter distributions

Questions?

Exploring Linkability of User Reviews

Exploring Linkability of User Reviews

Presentation Transcript

Exploring Linkability of User Reviews

Exploring Point of View

Exploring Point of View

EXPLORING PROPERTIES OF MATERIALS

IH Monitoring/User Experiment Reviews Survey of Practices

Linkability of Some Blind Signature Schemes

Exploring the integration of:

Exploring User Social Behavior in Mobile Social Applications

Reviews of Anime

Reviews of Avrillo

Reviews of infocampus

Reviews of Arousingdates.com

Reviews of CapitalRehabGroup

Reviews of Localtemptation.com

Exploring Levels of Organization

Text-based User-kNN: Measuring user similarity based on text reviews

Resurge Supplement Reviews-User Exposed Truth! Must Read!

Resurge Supplement Reviews-User Exposed Truth! Must Read!

How to automate user access reviews

SHARPEAR REVIEWS – SCAM ALERT! USER EXPOSED HERE!

Keravita Pro Reviews-It's SCAM? User Experience Revealed

Keravita Pro Reviews-It's SCAM? User Experience Revealed