Searching the Searchers with SearchAudit

Searching the Searchers with SearchAudit John P., Fang Yu, YinglianXie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab

Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

Introduction • A framework that identifies malicious queries from massive search engine logs to uncover their relationship with potential attacks. • Use a small set of malicious queries as seed, and generates regular expressions for detecting new malicious queries. Advanced Defense Lab

Introduction • Two stage: • Identification • Investigation • SearchAudit identifies malicious queries. • Analyzing those queries and the attacks of which they are part. Advanced Defense Lab

Introduction • Enhanced detection capability • 400 becomes 4 million. • Low false-positive rates. • 2% • Ability to detect new attacks • Forum spaming • Facilitation of attack analysis • Analyze a series of phishing attacks that lasted for more than one year. Advanced Defense Lab

Related Work • There’s a significant amount of automated Web traffic on the Internet. • Another research showed that more than 3% of the entire search traffic may be generated by stealthy search bots. • What’s the motivation of those search bots? • Search engine competitors • Studying search quality • Click fraud for monetary gain • Spreading infection (MyDoom, Santy) • Identifying victims Advanced Defense Lab

Related Work • Using regular expression patterns • Hon-eycomb • Polygraph • Hamsa • AutoRE (A way to generate RE from another research) Advanced Defense Lab

Architecture • Let attackers be our guides • Follow their activities and predict their future attacks. Advanced Defense Lab

Architecture • Platform • Dryad/DryadLINQ • Query Expansion • Taking a small set of seed queries and expand them • Extract IPs and search again • Regular Expression Generation • Signature Generation (AutoRE) • Eliminating Redundancies • Eliminating Proxies Advanced Defense Lab

Arch. – Eliminating Redundancies • Algorithm • REGEX_CONSOLIDATE Advanced Defense Lab

Architecture – Eliminating Proxies • Most users in a geographical region have similar query patterns. • Mostly legitimate users’ queries will have a large overlap with the popular queries from the same /16 IP prefix. • We label an IP as a proxy if K most popular queries from that IP and the K most popular queries from that prefix overlap in m queries. • K = 100, m = 5 Advanced Defense Lab

Data Description and Sys Setup • Use 3 months of search logs from the Bing search engine. • February 2009 (when it was known as Live Search) • December 2009 • January 2010 • Each month of sampled data contains around 2 billion pageviews. • The seed 500 malicious queries are obtained from a hacker Web site milw0rm.com • Takes about 7 hours to process the 1.2 TB of sampled data. Advanced Defense Lab

Selection of RE • Use Cookies to identify the malicious queries. • Benign proxy are eliminated. • Use a threshold to pick regular expressions based on their scores. Advanced Defense Lab

Detection Results:Effect of Query Expansion and Regular Expression Matching • Feed the 500 malicious queries into SearchAudit, we find that 122 of the 500 queries appear in the dataset. • February 2009 dataset • 174 IPs issued these queries • Use the result to feed our system again • 800 unique queries from 264 IPs Advanced Defense Lab

Detection Results Advanced Defense Lab

Effect of Incomplete Seeds • Split the 122 seed queries into two sets • 100 queries that were first posted on milw0rm.com before 2009 • 22 queries were posted in 2009 Advanced Defense Lab

Looping Back Seed Queries • Use derived RE as new seeds to feed back as an input to SearchAudit. Advanced Defense Lab

Overall Matching Statistics Advanced Defense Lab

Verification of Malicious Queries • As we lack ground truth information about whether a query is malicious or not. • Check whether the query is reported on any hacker Web sites • Check query behavior whether the query matches individual bot or botnet features • For each query q returned by SearchAudit • Issue a query “q AND (dork OR vulnerability)” to search engine, and save the results. Advanced Defense Lab

Verification of Queries Generated by Individual Bots • Two features help us to distinguish bot queries from human queries • Cookie: • Most bot queries do not enable cookies, resulting in an empty cookie field. • Normal users who do not clear their cookies, all the queries carry the old cookies. • Link clicked • Many bots do not click any link on the result page. Instead, they scrape the results off the page. Advanced Defense Lab

Verification of Queries Generated by Individual Bots Advanced Defense Lab

Verification of Queries Generated by Botnets • If most of the IPs that issued malicious queries exhibit similar behavior, then it’s likely that all these IPs were running the same script. • User agent • Contains information about the browser and the version used • Metadata • Records certain metadata that comes with the request • Pages per query • Records the number of search result pages retrieved per query • Inter-query interval • Denotes the time between queries issued by the same IP Advanced Defense Lab

Verification of Queries Generated by Botnets Advanced Defense Lab

Analysis of Detection Results • Large countries such as USA, Russia, and China are responsible for almost half the IPs issuing malicious queries. • Vulnerable Web Sites • Try to exploit these web sites by SQL injection • index.php?content=[ˆ?=#+;&:]{1,10} • Try to find particular software with known vulnerabilities • “Power by” • Forum spamming • “/includes/joomla.php” site:.[a-zA-Z]{2,3} • Windows Live Messenger phishing Advanced Defense Lab

Analysis of Detection Results Advanced Defense Lab

Identifying Vulnerable Web Sites • Applications of Vulnerability Searches • Sample 5000 queries returned by SearchAudit. • For every query q we issue a query “q –dork –vulnerability”. • Obtain 80,490 URLs from 39,475 unique Web sites. • Compare this list of random Web sites against a list of known phishing or malware sites. • PhishTank • Microsoft • Test and show that many of these sites indeed have SQL injection vulnerabilities. Advanced Defense Lab

Identifying Vulnerable Web Sites Advanced Defense Lab

SQL Injection Vulnerabilities • For the malicious queries, we look at the search results and crawl all of the links twice. • First time, we crawl the link as is • Second time, we add a single quote (‘) • If the two pages are identical, then it suggests that there’s no obvious SQL injection vulnerability • If the second page have any kind of SQL error, then there might exists an SQL injection vulnerability • In 14,500 URLs, we find 1,760 URLs (12%) may have SQL injection vulnerability. Advanced Defense Lab

Forum-Spamming Attacks • We manually identified 46 REs that are associated with forum spamming. Advanced Defense Lab

Advanced Defense Lab

Forum-Spamming Attacks Advanced Defense Lab

Apps of Forum Searching Queries • Using Project Hony Pot to identify Web spamming Advanced Defense Lab

Windows Live MSN Phishing • What is a MSN Phishing ? • http://[a-zA-Z0-9._]*.<domain-name>/ • http://<domain-name>?user=[a-zA-Z0-9._]* Advanced Defense Lab

Windows Live MSN Phishing Advanced Defense Lab

Characteristics of Compromised Accounts Advanced Defense Lab

Conclusion Advanced Defense Lab

Searching the Searchers with SearchAudit

Searching the Searchers with SearchAudit

Presentation Transcript

Searching the Web CS3352 Searching the Web

Using Context to Support Searchers in Searching

How to make searchers better searchers Vivian Lin Dufour 21 Oct 2010

Searching with Lucene

Shape Searchers

The Patent Searchers’ Dilemma:

POWER SEARCHING WITH GOOGLE

Searching with Ovid Medline

Searching with Google

Searching with EBSCO Discovery

QUICK GUIDE FOR NOVICE SEARCHERS

The Searchers

Searching with Nursing @Ovid

Searching with Cinahl

The Advanced Data Searching System with

The Advanced Data Searching System with

Searching with Context

MeSH for Searchers

Searching with Search Engines

Choose The Services Of Business Searchers : Rapid Legal Services

Searching with Lucene

Advanced Searching with