Automating Analysis of Large-Scale Botnet Probing Events

Automating Analysis of Large-Scale Botnet Probing Events Presenter : Jun-Yi Zheng 2010/07/05 Authors: Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson* Lab for Internet and Security Technology (LIST) Northwestern University * UC Berkeley / ICSI ASAICCS ’09 (March 2009)

Motivation IPv4 Space Botnets Can we answer this question with only limited information observed locally in the enterprise? Enterprise Does this attack specially target us? Administrators 2

Motivation Can we infer the probe strategy used by botnets? Can we infer whether a botnet probing attack specially targets a certain network, or we are just part of a larger, indiscriminant attack? Can we extrapolate botnet global properties given limited local information? 3

Agenda Motivation Basic framework Discover the botnet probing strategies Extrapolate global properties Evaluation Conclusions 4

Botnet Probing Events Big spikes of larger numbers of probers mainly caused by botnets 5

System Framework 6

Discover the Botnet Probing Strategies Use statistical tests to understand probing strategies Leverage on existing statistical tests Monotonic trend checking: detect whether bots probe the IP space monotonically Uniformity checking: detect whether bots scan the IP range uniformly. Design our own Hitlist (liveness) checking: detect whether they avoid the dark IP space Dependency checking: do the bots scan independently or are they coordinated? 8

Design Space 9

Hitlist Checking Configure the sensor to be half darknet and half honeynet Use metric θ= # src in darknet/ # src in honeynet. Threshold 0.5 10

Agenda Motivation Basic framework Discover the botnet probing strategies Extrapolate global properties Global scan scope, total # of bots, total # of scans, total scan rate for each bot Evaluation Conclusions 11

Extrapolate Global Properties: Basic Ideas and Validation Observe the packet fields that change with certain patterns in continuous probes. IPID: a packet field in IP header used for IP defragmentation Ephemeral port number: the source port used by bots Increment for a fixed # per scan Validation IPID continuity: All versions of Windows and MacOS Ephemeral port number continuity: botnet source code study Agobot, Phatbot, Spybot, SDbot, rxBot, etc. Control experiments with NAT 12

Estimate Global Scan Rate of Each Bot Count the IPID & ephemeral port # changes Recover the overflow of IPID and ephemeral port number Estimate the rate with linear regression when correlation coefficient > 0.99 Counter overestimation: use less of the two IPID T 13

Extrapolate Global Scan Scope IPv4 Space Botnets boti ni=100 Total scans from boti: scan rate Ri * scan time Ti = 100*1000=100,000 Local/global ratio Aggregating multiple bots 14

Extrapolate Global # of Bots Idea: similar to Mark and Recapture Assumption: All bots have the same global scan range • Total M=4000 Bots M • First half m1=1000 • Second half m2=1000 • Observed by both m12= 250 m1 m2 M=m1*m2/m12 15 m12

Dataset Based on a 10 /24 honeynet in a National Lab (LBNL) 293GB packet traces in 24 months (2006-07) Totally observed 203 botnet probing events Average observed #bots/event is 980. Mainly on SMB/WINRPC, VNC, Symantec, MSSQL, HTTP, Telnet Size of the system: 13,900 lines: Bro (6,000), Python (4,000), C++ (2,500), R (1,400) 17

More than 80% uniform scanning Validate the results through visualization and find the results are highly accurate. Property Checking Results 18

Extrapolation Results Most of extrapolated global scopes are at /8 size, which means the botnets do not target the enterprise (LBNL). Validation based with DShield data DShield: the largest Internet alert repository Find the /8 prefixes in DShield with sufficient source (bots) overlap with the honeynet events Due to incompleteness of Dshield data, 12 events validated Calculate the scan scope in each /8 based on sensor coverage ratio. 19

Extrapolation Validation Define scope factor as max(DShield/Honeynet,Honeynet/DShield) 75% within 1.35 All within 1.5 CDF of the scope factor 20

Conclusions Develop a set of statistical approaches to assess four properties of botnet probing strategies Designed approaches to extrapolate the global properties of a scan event based on limited local view Through real-world validation based on DShield, we show our scheme are promisingly accurate 21

Automating Analysis of Large-Scale Botnet Probing Events