Exploring Linkability of User Reviews

Exploring Linkabilityof User Reviews MishariAlmishari and Gene Tsudik Computer Science Department University of California, Irvine malmisha,gts@ics.uci.edu

Increasing Popularity of Reviewing Sites • Yelp, more than 39M visitors and 15M reviews in 2010

category Rating

Rising Awareness of Privacy

How Privacy apply to Reviews? • Traceability • Linkability of Ad hoc Reviews • Linkablility of Several Accounts

Contribution • Extensive Study to Measure privacy/linakability in user reviews • Propose models that adequately identify authors

Settings & Problem Formulation

IR: Identified Record AR: Anonymous Record IR AR IR AR IR AR AR IR

TOP-X Linkability Anonymous Record Size (AR) 1, 5, 10, 20,…60 X: 1 and 10 Matching Model Identified Record Size (IR)

Dataset • 1 Million Reviews • 2000 Users • more than 300 review

Methodology • Naïve Bayesian Model • Kullback-Leibler Model • Symmetric Version

Naïve Bayesian (NB) Anonymous Record (AR) Identified Record (IR) Decreasing Sorted List of IRs

Kullback-Leibler Divergence(KLD) Anonymous Record (AR) Identified Record (IR) Increasing Sorted List of IRs

Maximum Likelihood Estimation

Tokens • Unigram: ‘a’, ….’z’ • Digram: ‘aa’, ‘ab’,…,’zz’ • Rating :1,2,3,4,5 • Category: restaurant, Beauty and Spa, Education

Lexical Token Results

NB -Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

KLD - Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

NB Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1

KLD Digram Size 60, LR 99%/ Top-1 Size 30, LR 75%/ Top-1

Improvement (1): Combining Lexical and non-Lexical ones

Combining in NB model Straightforward • P(Rating|IR), P(Category|IR) • But for KLD? • Weighted Average

First, Combine Rating and Category 0.5 Second, Combine non-lexical and lexical 0.997/0.97 for Unigram/Digram

Token Combining Results

Rating, Category, and Unigram - NB Gain, up to 20% Size 30, 60 % To 80% Size 60, 83 % To 96%

Rating, Category, and Unigram - KLD Gain, up to 12% Size 40, 68 % To 80% Size 60, 83 % To 92%

Rating, Category, and Digram - NB

Rating, Category, and Digram - KLD

What about Restricting Identified Record (IR) Size?

TOP-X Linkability Anonymous Record Size (AR) X: 1 and 10 Matching Model Identified Record Size (IR)

Restricted IR - NB Affected by IR size

Restricted IR - KLD Performed better for smaller IR Size 20 or less, improved The rest, comparable

What about Matching All AR’s at once?

TOP-X Linkability Anonymous Record Size (AR) X: 1 and 10 Matching Model Identified Record Size (IR)

Anonymous Records (AR’s) Matching Model Identified Records (IR’s)

Improvement (2): Matching All IR’s At Once

✔ ✖ ✔ ✖ ✖ ✔ ✖ ✖ ✖ ✔

MatchAll - Restricted Gain, up to 16% Size 30, From 74% To 90%

Matchall - Full Gain, up to 23% Size 20, From 35% To 55%

Improvement (3): For Small IR Size

Changing it to: + Review Length 0.5

Results – Improvement (3) Gain up to 5% Size 10, 89% To 92% Size 7, 79% To 84%

Discussion • Implications • Cross-Referencing • Review Spam • Non-Prolific Users • Gradually becomes prolific • IR of 20, Link Around 70% • Anonymous Record Size • Linkability high even for small (92% for AR of 10) • 60 only 20% of min user contribution

Discussion (cont.) • Unigram Token • Very Comparable for larger AR • Entail less resources in the attach 26 VS 676

Future Directions • Improving more for Small AR’s • Other Probabilistic Models • Using Stylometry • Exploring Linkability in other Preference Databases • More than one AR for different Users: Exploring it more

Conclusion • Extensive Study to Assess Linkability of User Reviews • For large set of users • Using very simple features • Users are very exposed even with simple features and large number of authors

Thank you all!

Exploring Linkability of User Reviews

Exploring Linkability of User Reviews

Presentation Transcript

Exploring Linkability of User Reviews

Linkability of Some Blind Signature Schemes

Exploring User Social Behavior in Mobile Social Applications

IH Monitoring/User Experiment Reviews Survey of Practices

Text-based User-kNN: Measuring user similarity based on text reviews

Resurge Supplement Reviews-User Exposed Truth! Must Read!

Resurge Supplement Reviews-User Exposed Truth! Must Read!

How to automate user access reviews

SHARPEAR REVIEWS – SCAM ALERT! USER EXPOSED HERE!

Keravita Pro Reviews-It's SCAM? User Experience Revealed

Keravita Pro Reviews-It's SCAM? User Experience Revealed

Streamlining Security- The Importance of User Access Reviews

Exploring the New Trend in Tech Reviews

Exploring the Impact of Jerry Sargeant's Healing Modalities Through Reviews

Best Marketplace Software Reviews, Prices, User Charges.docx

Exploring the Healing Potential of BioHeal CBD Gummies: A Comprehensive Reviews

FitSpresso Reviews (Critical User Warning)

Crafting User Experiences Exploring the Interaction Design Syllabus

User Access Reviews

Unlocking Financial Wellness_ Exploring Fincity Reviews

Empowering Financial Futures_ Exploring Fincity Reviews

Exploring DBS Bank Reviews in Singapore