650 likes | 802 Vues
Internet Systems and Technology Presentation. Prepared By: Kunal Suri (733). Spam?. Disruptive messages, especially commercial messages posted on a computer network or sent as e-mail. Use of electronic messaging systems to send unsolicited bulk messages indiscriminately.
E N D
InternetSystems and Technology Presentation Prepared By: Kunal Suri (733)
Spam? • Disruptive messages, especially commercial messages posted on a computer network or sent as e-mail. • Use of electronic messaging systems to send unsolicited bulk messages indiscriminately. Theory about origin of word ‘Spam’ Spam (its name a portmanteau of the words "Spiced" and "Ham”) is a canned precooked meat product made by the Hormel Foods Corporation, first introduced in 1937 The name Spam became popular in a way from a TV series of the 70’s ‘Monty Python’ sketch in which ‘Spam’ (meat) is included in almost every dish.
Agenda • Click Trajectories: End-to-End Analysis of the Spam Value Chain • Spamscatter: Characterizing Internet Scam Hosting Infrastructure
Click Trajectories: End-to-End Analysis of the Spam Value Chain By: Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Mark Flegyhaziz, Chris Griery, Tristan Halvorson, Chris Kanich, Christian Kreibichy, He Liu, Damon McCoy, Nicholas Weavery, Vern Paxsony ,Geoffrey M. Voelker, Stefan Savage
Introduction • Spam-based advertising is a ‘Business’. • Spam has lead to both, a widespread antipathy and a multi-billion dollar anti-spam industry, but it continues to exist because it does fuels a profitable enterprise. • There is still a lack of solid understanding of this enterprise’s full structure, and thus most anti-spam interventions focus on only one facet of the overall spam value chain like, spam filtering, URL blacklisting, site takedown. • In this paper the authors use a holistic approach that quantifies the full set of resources employed to monetize spam email i.e naming, hosting, payment and fulfillment. • They described a large-scale empirical study to measure the spam value chain in an end-to-end fashion.
Spam Value Chain Each distinct path through this ‘Spam chain’ i.e • Registrar • name server • hosting • affiliate program • payment processing • Fulfillment etc directly reflects an “entrepreneurial activity” by which the perpetrators muster capital investments and business relationships to create value.
Spam Value Chain • There are many basic characteristics of this activity that still lack insight like: • How many organizations are complicit in the spam ecosystem? • Which points in their value chains do they share and which operate independently? • How “wide” is the bottleneck at each stage of the value chain—do miscreants find alternatives plentiful and cheap, or scarce, requiring careful husbanding? To answer these questions the Authors develop a methodology for characterizing the end-to-end resource dependencies (“trajectories”) behind individual spam campaigns and then analyze the relationships among them
How Modern Spam Works The Spam value chain can be classified into three distinct stages: • Advertising • Click support • Realization
Advertising • Advertising constitutes all activities focused on reaching potential customers and enticing them into clickingon a particular URL. • Large-scale efforts to shut down open SMTP proxies & introduction of well-distributed IP blacklisting of spam senders have pushed spammers to using more sophisticated delivery vehicles. • These include botnets, Webmail spam andIP prefix hijacking. • Moreover, the market for spam services has stratified over time; for example, today it is common for botnet operators to rent their services to spammers on a contract basis • In all, the advertising side of the spam ecosystem has by far seen the most study, no doubt because it reflects the part of spam that users directly experience
Click support • A spammer depends on the response of some fraction of the recipients by clicking on an embedded URL and thus directing their browser to a Web site of interest. • This process seems simple, but in practice a spammer must orchestrate many moving parts and maintain them against pressure from Spam defenders. These parts include: • Redirection sites • Domains. • Name servers • Web servers • Stores and Affiliate Programs
Redirection Sites: • Spammers can directly advertise a URL • The recipient’s browser can resolves the domain and fetches the content from it. • However, a variety of defensive measures—including URL and domain blacklisting, as well as site takedowns by ISPs and domain takedowns by registrars—have spurred more elaborate steps. • Spammers can advertise URLs that, when visited, redirect to additional URLs. • Redirection strategies primarily fall into two categories: • A legitimate third party inadvertently controls the DNS name resource for the redirection site (e.g., free hosting, URL shorteners, or compromised Web sites), • The spammers themselves, or perhaps parties working on their behalf, manage the DNS name resources (e.g., a “throwaway” domain such as minesweet.ru redirecting to a more persistent domain such as greatjoywatches.com ).
Domains • The Spam Value chain uses domain that may come via the services of a domain registrar, who arranges for the rootlevel registry of the associated top-level domain (TLD) to hold NS records for the associated registered domain. • A spammer may purchase domains directly from a registrar, but will frequently purchase instead from a domain reseller, from a “domaineer” who purchases domains in bulk via multiple sources and sells to the underground trade, or directly from a spam “affiliate program” that makes domains available to their affiliates as part of their “startup package.”
Name servers • Registered domain must in turn have supporting name server infrastructure. • Spammers must provision this infrastructure either by hosting DNS name servers themselves, or by contracting with a third party. • Since such resources are vulnerable to takedown requests, a thriving market has arisen in so-called “bulletproof” hosting services that resist such requests in exchange for a payment premium.
Web servers • The address records provided by the spammer’s name servers must in turn specify servers that host (or more commonly proxy) Web site content. • As with name servers, spam-advertised Web servers can make use of bulletproof hosting to resist takedown pressure Some recent interventions have focused on effectively shutting down such sites by pressuring their upstream Internet service providers to deny them transit connectivity
Stores and Affiliate Programs • Today, spammers operate primarily as advertisers, rarely handling the back end of the value chain. • Such spammers often work as affiliates of an online store, earning a commission (typically 30–50%) on the sales they bring in. • The affiliate program typically provides the storefront templates, shopping cart management, analytics support, and even advertising materials. • Finally, affiliate programs take responsibility for contracting for payment and fulfillment services with outside parties.
Realization • Once the customer is brought to an advertised site and has been convinced to purchase some product, the seller realizes the latent value by acquiring the customer’s payment through conventional payment networks, and in turn fulfilling their product request. • Payment services. • Uses credit card and related services for max. benifit • Fulfillment. • Once order is confrirmed and payment is made the prduct is shiped to the customer
Spam value chain example (1) On October 27th, the Grum botnet delivered an email titled VIAGRA Official Site. The body of the message includes an image of male enhancement pharmaceutical tablets and their associated prices (shown). (2) The image provides a URL tag and thus when clicked directs the user’s browser to resolve the associated domain name, medicshopnerx.ru (3) This domain was registered by REGRU-REG-RIPN (a.k.a. reg.ru ) on October 18th and was active even after this writing complaint. (4) The machine providing name service resides in China, while hosting resolves to a machine in Brazil.
(5) The user’s browser initiates an HTTP request to the machine, and receives content that renders the storefront for “Pharmacy Express,” a brand associated with the Mailien pharmaceutical affiliate program based in Russia (6). (7) After purchase selection the storefront redirects the user to a payment portal served from payquickonline.com (IP address in Turkey), which accepts the user’s details, confirms the order, provides an EMS tracking number, and includes a contact email for customer questions. The bank that issued the user’s credit card transfers money to the acquiring bank, in this case the Azerigazbank Joint-Stock Investment Bank in Baku, Azerbaijan (BIN 404610, 7). (8)Ten days later the product arrives, blister-packaged, in a cushioned white envelope with postal markings indicating a supplier named PPW based in Chennai, India as its originator.
DATA COLLECTION METHODOLOGY • Collecting Spam-Advertised URLs • Crawler data • Content Clustering and Tagging • Purchasing
Different Steps while collecting Data (1) Feed parsers extract embedded URLs from the raw feed data for further processing.
(3) A DNS crawler enumerates various resource record sets of the URL’s domain, while a farm of Web crawlers visits the URLs and records HTTP-level interactions and landing pages (4) . A clustering tool clusters pages by content similarity
(5). A content tagger labels the content clusters according to the category of goods sold, and the associated affiliate programs (6) We then make targeted purchases from each affiliate program, and store the feed data and distilled and derived metadata in a database
Collecting Spam-Advertised URLs “bot” feeds tend to be focused spam sources, while the other feeds are spam sinks comprised of a blend of spam from a variety of sources. Example,Rustock bot - 13M distinct domains is an artifacts of a “blacklist-poisoning” campaign undertaken by the bot operators that comprised millions of “garbage” domains
Crawler data • Use of DNS Crawler to identify the name server infrastructure used to support spam Advertised domains, and the address records they specify for hosting those names. • Use of Web Crawler to replicate User Clicking experience by having crawler replica over cluster of machines. Each replica running 100 instances of Firefox.
Content Clustering and Tagging • The crawlers provide low-level information about URLs and domains. • This stage process the crawler output to associate this information with higher-level spam business activities. • To classify each Web site, • content clustering is used to match sites with lexically similar content structure • category tagging to label clustered sites with the category of goods they sell • program tagging to label clusters with their specific affiliate program and/or storefront brand.
Content Clustering and Tagging • Content clustering: Uses a q-gram similarity approach to generate a fingerprint consisting of a set of multiple independent hash values over all 4-byte tokens of the HTML text, compares it with the fingerprints representing existing clusters and places the page in the cluster with the greatest similarity. • Category tagging: Uses generic keywords found in the page content, labeling those clusters with category tag corresponding to the goods they are selling • Program tagging: The clusters are labeled with specific program tags to associate them with a certain affiliate program
Purchasing • For a subset of the sites with program tags, some goods being offered for sale were purchased. • The authors attempted 120 purchases, of which 76 authorized and 56 settled. • For study purposes the author leased mailboxes and placed our purchases via VPN connections to IP addresses located in the geographic vicinity to the mailing addresses used. • This constraint is necessary to avoid failing common fraud checks that evaluate consistency between IP-based geo-location, mailing address and the Address Verification Service (AVS) information provided through the payment card association.
ANALYSIS The major goal of the paper is to identify any “bottlenecks” in the spam value chain: opportunities for disrupting monetization at a stage where the fewest alternatives are available to spammer’s • The authors focus on on analyzing the degree to which affiliate programs share infrastructure by considering Click Support & Realization • The authors also try to perform the Intervention analysis & Policy options to handle these spams
Click Support • The clicking support in a Spam value chain includes: Redirection: Web sites redirect the visitor from the initial domain found in a spam message to one or more additional sites, ultimately resolving the final Web page (we call • 32% of crawled URLs redirected at least once. Out of them • 6% did so through public URL shorteners (e.g., bit.ly ), • 9% through well-known “free hosting” services (e.g., angelfire.com ) • 40% were to a URL ending in .html (typically indicating a redirect page installed on a compromised Web server). • Others used low-quality “throw away” domains
Click Support: Network infrastructure sharing • 80 registrars, serve a single affiliate program. • while just two registrars (NauNet and China Springboard) serve domains for over 20 programs. Sharing of network infrastructure among affiliate programs. Only a small number of registrars host domains for many affiliate programs, and similarly only a small number of Ases (Autonomous Systems) host name and Web servers for many programs.
Network infrastructure sharing For 50% of the affiliate programs, their domains, name servers, and Web servers are distributed over just 8% or fewer of the registrars and ASes, respectively; 80% of the affiliate programs have their infrastructure distributed over 20% or fewer of the registrars and ASes. Only a handful of programs, such as EvaPharmacy, have infrastructure distributed over a large percentage (50% or more) of registrars and ASes. To summarize, there are a broad range of registrars and ISPs who are used to support spam-advertised sites, but there is only limited amounts of organized sharing and different programs appear to use different subsets of available resource providers
Realization • Realization includes post-order communication, authorization and settlement of credit card transactions, and order fulfillment. • The authors understood that the hypothesis that realization infrastructure is the province of affiliate programs and not individual affiliates was true. • A consistency in payment processing and fulfillment between different instances of the same affiliate program or store brand confirmed that a range of otherwise distinct brands all belong to the same underlying affiliate program, like Ultimate Replica, Diamond Replicas, Distinction Replica
How much realization infrastructure is being shared across programs? • Payment: Of 76 purchases for which transaction information was received , there were only 13 distinct banks acting as Visa acquirers. • E.g Pharmaceutical affiliate programs used two banks (in Azerbaijan and Latvia), and software was handled entirely by two banks (in Latvia and Russia).
Fulfillment • Fulfillment for physical goods was sourced from 13 different suppliers (as determined by declared shipper and packaging), of which eight were again seen more than once • To summarize, the authors could not identify any particularly clear bottleneck in fulfillment and found that suppliers are likely to be plentiful.
Intervention analysis Anti-spam interventions need to be evaluated in terms of two factors: • Overhead to implement • Business impact on the spam value chain. This business impact is the sum • the replacement cost (to acquire new resources equivalent to the ones disrupted) • the opportunity cost (revenue forgone while the resource is being replaced). For any given registered domain used in spam, the defender may choose to intervene by • either blocking its advertising (e.g., filtering spam), • disrupting its click support (e.g., takedowns for name servers of hosting sites), • interfering with the realization step (e.g., shutting down merchant accounts). But which of these interventions will have the most impact ?
Almost 40% of spam-advertised domains were registered by NauNet • Evolva Telecom, (Romanian) hosts almost 9% of name servers for spam-advertised domains and over 10% of the Web servers hosting their content • 60% of payments handled via a single acquirer, Azerigazbank.
Availability of alternatives and their switching cost? • It is seen that a small number of individual IP addresses were used to support spam-advertised sites, But the supply of hosting resources is vast, with thousands of hosting providers and millions of compromised hosts. • The switching cost is also low and new hosts can be provisioned on demand and for low cost. • Intervention at registrars level appears more promising as the supply of registrars is fewer (roughly 900 gTLD registrars are accredited by ICANN as of this writing) and not all registrars are equally tolerant of spam-based advertising • Ultimately, the low cost of a domain name (many can be had for under $1 in bulk) and ease of switching registrars makes such interventions Difficult.
Conclusion • The Author's conclude that the banking component of the spam value chain that is both the least studied the most critical. • There may be thousands of banks, but those banks that are willing to knowingly process the “high-risk” transactions are very smaller. • A truly effective intervention can be brought through policy action in Western countries i.e. by either stopping the banks on doing business with such merchants or putting a “financial blacklist” which could be updated very quickly.
By:David S. Anderson, Chris Fleizach, Stefan Savage, Geoffrey M. Voelker Spamscatter: Characterizing Internet Scam Hosting Infrastructure
Introduction • The intent of SPAM is to attract the recipient into entering a commercial transaction— typically via a linked Web site. • The ultimate driving force for pump out billions of such solicitations is the “point-of-sale” — the various money-making “SCAMS” that extract value from Internet users. • This paper focuses squarely on the Internet infrastructure used to host and support such spam. • It describe an opportunistic measurement technique called spamscatter that mines emails in real-time, follows the embedded link structure, and automatically clusters the destination Web sites using image shingling to capture graphical similarity between rendered sites.
Some facts, • In 2006, 80% mails over all Internet email, with a total volume up to 85 billion per day were spam • Annually more than $1B spent is on anti-spam technology. • To better understand spam, the paper tries to analyze spam advertised Web servers that offer merchandise and services • For example, a given spam campaign may use thousands of mail relay agents to deliver its millions of messages, but only use a single server to handle requests from recipients who respond. • Consequently, the availability of scam infrastructure is critical to spam profitability. • A single takedown of a scam server or a spammer redirect can curtail the earning potential of an entire spam campaign.
Spamscatter • Each SCAM is identified in the link structure of associated spams. • The Authors built a system that mines email, identifies URLs in real time and follows such links to their eventual destination server (including any redirection mechanisms put in Place). • They identify individual scams by clustering scam servers whose rendered Web pages are graphically similar using a technique called IMAGE SHINGLING . • They actively probe the scam servers on an ongoing basis to characterize dynamic behaviors like availability and lifetime.
Example of scam advertises (Downloadable Software) • Spam campaign launches a vast number of unsolicited spam messages to email addresses around the world; • A large spam campaign can exceed 1 billion emails • The content in these messages frequently advertises a scam ie. unsolicited merchandise and services available through the Web by embedding URLs to scam Web servers in the spam • In our data, roughly 30% of spam contains such URLs • The embedded URL may directly specifies the scam server or use redirection for avoiding anit-spam mechanism like Blacklisting of URL.
On the back end, scams may use multiple servers to host scams, both in terms of multiple virtual hosts • For the scams in these spam feed, the use of multiple virtual hosts was infrequent (16%of scams) and multiple physical hosts was rare (6%). • Also, it was seen that different Web servers (physical or virtual), and even different accesses to a scam using the same URL, can result in slightly different downloaded content for the same scam.
Screenshots, hostnames, and IP addresses of different hosts for the “Downloadable Software” scam.The highlighted regions show portions of the page that change on each access due to product rotation. Image shingling is resilient to such changes and identifies these screenshots as equivalent pages.
Certain Observations related to Spam • A machine hosting one scam may be shared with other scams as when scammers run multiple scams at once or the hosts are third-party infrastructure used by multiple scammers. • Sharing is common, with 38% of scams being hosted on a machine with at least one other scam • Example, one of the machines hosting the software scam, also hosted a pharmaceutical scam called “Toronto Pharmacy”. • Scam hosts have high availability during their lifetime (most above 99%) and appear to have good network connectivity • Scam hosts tend to be geographically concentrated in the United States; over 57% of scam hosts from our data mapped to the U.S. • But only 14% of spam relays (80% is relayed by BOT’s) used to send the spam to our feed are located in the U.S
Methodology used for Spamscatter • Data collection framework • Image shingling
Data collection framework • Data collection tool, called the spamscatter prober , that takes spam emails as input, extracts the sender and URLs from the spam messages, and probes those hosts to collect various kinds of information • As with spam senders, it first performs a ping, traceroute, and DNSBL lookup on scam hosts. • It downloads and stores the full HTML source of the Web page specified by valid URLs extracted from the spam (we do not attempt to de-obfuscate URLs). • Also renders an image of the downloaded page in a canonical browser configuration using the KHTML layout engine, and stores a screenshot of the browser window. • The prober accommodates a variety of link forwarding practices (Managing Redirection)