1 / 32

@spam: The Underground on 140 Characters or Less

@spam: The Underground on 140 Characters or Less. Chris Grier, Vern Paxson , Michael Zhang University of California, Berkeley Kurt Thomas University of Illinois, Urbana- Champaign ACM CCS 2010. Agenda. Introduction Background Data Collection Spam On Twitter Spam Campaign

nituna
Télécharger la présentation

@spam: The Underground on 140 Characters or Less

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. @spam: The Underground on 140 Characters or Less Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University of Illinois, Urbana-Champaign ACM CCS 2010

  2. Agenda • Introduction • Background • Data Collection • Spam On Twitter • Spam Campaign • Blacklist Performance • Conclusion

  3. Introduction • Twitter has developed a following of 106 million users that post to the site over one billion times per month • Threat: • Force guessing of weak passwords • Phishing • … • Twitter currently lacks a filtering mechanism to prevent spam, with the exception of malware, blocked using Google’s SafebrowsingAPI • Twitter has developed a loose set of heuristics to quantify spamming activity, such as excessive account creation or requests to befriend other users

  4. Introduction (cont.) • Present the first in-depth look at spam on Twitter • Finding that 0.13% of users exposed to spam URLs click though to the spam web site • Identify a diversity of spam campaigns exploiting a range of Twitter features to attract audiences • Blacklists are currently too slow to stop harmful links • Two types of spamming accounts on twitter

  5. Background • Common techniques to filter email spam • IP blacklisting • domain and URL blacklisting • filtering on email contents • Social network spam requires a large social circle • The challenge of a successful spam campaign in Twitter: • Obtaining enough accounts URL shortening services on Twitter • Have enough fresh URLs

  6. Background (cont.) • Tweets:Twitter restricts these updates to 140 characters or less • URL shortening • Follower:How to obtain a lot of followers • Friends:Relationships in Twitter are not bidirectional • Mentions, Retweets, Hashtags

  7. Data collection • Collect data from two separate taps • targets a random sample of Twitter activity • specifically targets any tweets containing URLs. • use a custom web crawler to follow the URL through HTTP status codes and META tag redirects until reaching the final landing • Redirect resolution removes any URL obfuscation that masks the domain of the final landing page

  8. Data collection (cont.) • We regularly check every landing page’s URL in our data set against three blacklists: • Google Safebrowsing→phishing or malware • URIBL, Joewein →domain present in spam email • Once alanding page is marked as spam, we analyze the associatedspam tweets and users involved in the spam operation. • Wehave found that URIBL and Joewein include domains that are notexclusively hosting spam

  9. Data collection (cont.) • During this time we gathered over 200million tweets from the stream → Over3million tweets were identified as spam • Crawled 25 million URLs → 8% of all unique links were identified as spam by blacklists • 5% were malware and phishing • 95% directed users towards scams

  10. Data collection (cont.) • bit.ly or an affiliated service is used to shorten a spam URL • we use the bit.ly API to download clickthroughstatistics and click stream data which allows us to identify highly successful spam pages and the rate of traffic

  11. Spam On Twitter • Spammers must coerce Twitter members into following spam accounts • spamming bots • compromised accounts • unwitting participants in spam distribution.

  12. Spam On Twitter (cont.) • Roughly 50% of spam was uncategorized due to using random terms • This table is the other 50%

  13. Spam On Twitter (cont.)

  14. Spam On Twitter (cont.) • Call outs : Mentions are used by spammers to personalize messages in an attempt to increase the likelihood a victim follows a spam link. • Retweets : four sources of spam retweets : • retweetspurchased by spammers from respected Twitter members • spam accounts retweeting other spam • hijacked retweets • users unwittingly retweeting spam. Example: Win an iTouch AND a $150 Apple gift card @victim! http://spam.com Example: RT @scammer: check out the Ipads there having a giveaway http://spam.com

  15. Spam On Twitter (cont.) • Tweet hijacking : spammers can hijack tweets posted by other users and retweetthem, prepending the tweet with spam URLs. • Trend setting : the anomaly of 70% of phishing and malware spam containing hashtagscan be explained by spammers attempting to create a trending topic • Trend hijacking : Rather than generating a unique topic, spammers can append currently trending topics to their own spam. Example: http://spam.com RT @barackobama A great battle is ahead of us Example: Buy more followers! http://spam.com #fwlr

  16. Spam On Twitter (cont.)

  17. Spam On Twitter (cont.) • Coefficient of correlation between clicks and feature • accounts involved in spamming and the number of followers that receive a link (ρ > 0.7) • Hashtag (ρ=0.74) • retweets with hashtags (ρ=0.55) • number of times spam is tweeted (ρ=0.28) • indicating that repeatedly posting a link does little to increase traffic.

  18. Spam On Twitter (cont.) • To understand the effectiveness of tweeting to entice a follower into visiting a spam URL • Reach = t × f • t: the total tweets sent • f: the followers exposed to each tweet • Averaging of (clicks /reach) for each of the 245,000 URLs in our bit.ly data set • find roughly 0.13% of spam tweets generate a visit, orders of magnitude higher when compared to clickthrough rates of 0.003%–0.006% reported for spam email

  19. Spam On Twitter (cont.) • A number of factors which may degrade the quality of this estimate • bit.ly URLs which may carry an inherent bias of trust as the most popular URL shortening service • click data from bit.ly includes the entire history of a link, while our observation of a link’s usage only account for one month of Twitter activity

  20. Spam On Twitter (cont.) • Twitter accounts • career spamming account • a compromised account was created by a legitimate user • Tests • x2test on timestamp • Tweet text and link entropy

  21. Spam On Twitter (cont.) • Compromised spamming accounts • an account could have been compromised by means of phishing, malware, or simple password guessing, currently a major trend in Twitter • the Koobfacebotnet

  22. Spam Tools

  23. Spam Campaigns • Campaign : the set of accounts that spam at least one blacklisted landing page in common • To cluster accounts into campaigns • vector c = {0, 1}n • cicj, indicating at least one link is shared by both accounts.

  24. Spam Campaigns (cont.) • if an account participates in multiple campaigns, the algorithm will automatically group the campaigns into a single superset • An account is shared by two spammers • used for multiple campaigns over time by a single spammer • compromised by different services

  25. Spam Campaigns (cont.)

  26. Spam Campaigns (cont.)

  27. Spam Campaigns (cont.) • URLs being tweeted • Single hop (shortened →landing page) • Second hop(shortened URL → affiliate link → landing page). • landing page itself appears in tweets • Phishing for followers • websites purporting to provide victims with followers if they revealed their account credentials • phished accounts are used to further promote the phishing campaign. • Defining features • tweets in this campaign is the extensive use of hashtags, 73%

  28. Spam Campaigns (cont.) • Personalized mentions (http:// twitprize.com) • Spam within the campaign would target victims by using mentions and crafting URLs to include the victim’s Twitter account name to allow for personalized greetings • Defining features • 99% are a retweet or mention • this campaign pass the entropy tests since each tweet contains a different username and the links point to distinct twitprize URLs.

  29. Spam Campaigns (cont.) • Buying retweets • One such service, retweet.it • Defining features • unique feature present in all retweet.it • Distributing malware • Defining features • One difference from other campaigns is this use of redirects to mask the landing page (bit.ly → intermediate →malware landing site) • Nested URL shortening

  30. Blacklist Performance

  31. Blacklist Performance(cont.)

  32. Conclusion • This paper presents the first study of spam on Twitter including spam behavior, clickthrough, and the effectiveness of blacklists to prevent spam propagation • By measuring the clickthrough of these campaigns, we find that Twitter spam is far more successful at coercing users into clicking on spam URLs than email, with an overall clickthrough rate of 0.13%. • If blacklists were integrated into Twitter, they would protect only a minority of users • URLs posted to the site must be crawledto unravel potentially long chains of redirects, using the final landing page for blacklisting.

More Related