740 likes | 746 Vues
Attacking Strategies Analysis on Social Media. Chun-Ming Lai Computer Science, University of California, Davis. Social Media. Exerting significant impact on mass communication. Traditional communication. Authoratative. Social Media. Distributed.
E N D
Attacking Strategies Analysis on Social Media Chun-Ming Lai Computer Science, University of California, Davis
Social Media • Exerting significant impact on mass communication
Traditional communication Authoratative
Social Media Distributed
Facebook.com/63811549237/posts/10153038271604238 2014,12-19,03:06am
Major Dimensions Likely offender (Attacker Bahavior) • Malicious URLs • Facebook Social Media Dataset • Targets / Environments /Impact of campaigns • Attackers digital footprints The absence of capable guardians (potential audience) Suitable Targets (Targets posts, pages)
Security Threat • Severe Threat • Phishing • Malware, drive-by-download • Medium to light Threat • Advertisement • Spamming (Fund-raising, porn, canned messages, etc.) • New type Threat • Rumors,Mediamanipulation,signup,votestuffing,etc. • Fake News • Crowdturfing = CrowdSourcing + Astroturfing
Difficulty & Challenge • Heterogeneous and huge data • Text, media, transaction, etc. • Labeled Data is precious • Different Criteria • Data size and type • New Patterns of Online Service • Application Bursts, Facebook Live, Game, etc.
Suitable Targets (Targets posts, pages) Hopefully Contribution (3W1H) • Where ?? • US, Middle East, Asia, etc. • Politics, sports, entertainment, etc. • How efficiency ? • Audience, User experience, etc. • Search Engine Spam, phishing, social media manipulation, sign up, etc. • Who ? • Fake, net army, compromised, etc. • What are these Malicious URLs for ? The absence of capable guardians (potential audience) Likely offender (Attacker Behavior)
Distributed, trustworthy Distributed
Outline • Introduction • Related Work & Evaluation Tools • Suitable Targets • Potential Audience • Attackers Behavior • Future Work
Related work • Context Filter (V. Balakrishnan 2016, C. Grier 2010, G. Stringhini 2010 ) • Blacklists • Text structure & pattern • User-profile(K.Lee,2010) • Geography, personal info. • created (updated) time • profile pictures
Related Work (cont’) • Behavior-driven signal(C.Cao2015,G.Wang2013) • Clicks • Likes • Shares • Network-based (B.Viswanath2010) • Edge: friend, like similarity, etc. • Static or dynamic Margin groups • Find one, and clustering • Combine 4 categories to do so
Evaluation Tools • VirusTotal • API, 60+ security engine support, • Avira, Kapersky, Google Safebrowsing, etc. • URLBlacklists • File based, 100+ categories, 10,000,000 + domain • Ads, porn, drug, weapon, etc.
Sorted blacklists Sorted url_parsed with prefix Labeled Data Black.com Black1.net Phish.com … …. …. d.Com c.d.com b.c.d.com a.b.c.d … …
Outline • Introduction • Related Work & Evaluation Tools • Suitable Targets • Potential Audience • Attackers Behavior • Future Work
Suitable Targets Problem • Any post thread p in social media platform, predict whether p contains at least one malicious comment via a classifier – c {target,nontarget}
Popularity • Attention is everything !!! • Avg. Time: FB/ 50 mins, sports/ 17 mins [FB / NYT] • Liking, commenting, sharing, reading, etc. • Interdisciplinary Works – Economy, advertisement, communication • Output: tweets counts, FB shares / comments, total clicks, etc. • Input: content, topic, number of comments after a short time, etc. • Theory: Information Cascade, bandwagon effect, attention economy, etc. • Reference:(A.Tatar,2011),(C.Castillo,2010),(K.Wang,2015)
Definition • Time Series (TS) • TScreated(post): the time an original article is posted • TSj: a time period j following the time of the original • TSfinal: the end of our observation • Accumulated Number of participants (AccNcomment) • The number of post comments between TSi and TS(i-1) • Discussion Atmosphere Vector (DAV)
Example • TScreated(Climate) = 2014-12-19 03:06:42 • Suppose j = 5, final = 120 • DAV(Climate) = [# of comments 03:06:42 ~ 03:11:421st # of comments 03:11:42 ~ 03:16:422nd … # of comments 05:01:42 ~ 05:06:42]24th
Dataset Totally 42,703,463 • 2011~2014 Ten Main Media pages on Facebook
Several static features • Spanning time(Shelf-life) • Time(last comment) – Time (post time) • # of comments • Total # of cmts regarding posts • users, likes, etc.
Results NearRealTime
Next question: prefer which stage? • Early • Lead the discussion in the beginning • User Interface • Late • Notification function • New coming Audience • Middle or random • The advantage of two
Discussion (1/2) 9420 comments have been detected, provided by 5026 accounts
Discussion (2/2) Discussion (2/2)
Remarks • Predict Suitable Targets successfully with temporal features • Attackers: Follow or not? • Defenders: Deploy resource • Temporal Analysis with different variables • Stage • Exact time after post created • Time duration between two consecutive malicious comments in the same page
Outline • Introduction • Related Work & Evaluation Tools • Suitable Targets • Potential Audience • Attackers Behavior • Future Work
Why study Effectiveness • Communication is trying to influence others. • Qualitative and quantitative analysis for each mURL. • Risk Assessment and control
Intuitive thinking • How many people have seen/clicked the message? (Directly) • Hard to get entire data since recommending system • Communication • User intention to rejoin • Shelf-live period
Estimate Audience • Action Within in Page G action—comment, like, angry, reaction, etc. T0 (attack) T0 - T0 +
Indirect influence – final comments • Predicting final comments/visits using post’ early stage reaction • Distribution matrix Dij (j participants within i minutes) • Prediction Matrix Mij
Example • 4 Postswith final comments: • A (100), B (101), C(102), D(2) • D56 = {A,B,C} • Input a post E got 6 comments within first 5 minutes • Probably > 100 (lower bound) • ~90% accuracy
Some future work • More accurate prediction • > 100 v.s. 100~200 • Pick “popular ” from Non-Target • Some pages have lots of low popularity posts Target posts Non-Target posts
Remarks • Direct Estimation • Twindow, , hundreds of audiences will be influenced • Indirect Estimation • Impact to life cycle (even popular)
Outline • Introduction • Related Work & Evaluation Tools • Suitable Targets • Potential Audience • Attackers Behavior • Future Work
Work Review Social Media Manipulation Sign up Search Engine Spamming Vote Stuffing • Network-based • Static: Margin • Dynamic: Deviation • Behavior, profile based • No or google images • Anomaly Detection • Notjustclassification • Fake,compromised
Accounts other activities • From previous experiment, 5026 malicious accounts were identified • 40,000 + pages on Facebook (2011-2016) • >70% accounts don’t have “like” • Like is easier 9420 comments have been detected, provided by 5026 accounts
Accounts footprints • Response time to post thread • Ten comments to ten different articles • Remain online to “lead’ discussion Commenting time Vector = Vote Stuffing
Normal v.s. Malicious accounts • Malicious accounts like to comment in the late • Legitimate accounts commits after a fixed time from original article
Same content, multiple accounts • One message, multiple accounts (red) • One account, same but different post threads (green)
Outline • Introduction • Related Work & Evaluation Tools • Suitable Targets • Potential Audience • Attackers Behavior • Future Work