Dynamic Application-Layer Protocol Analysis

Dynamic Application-Layer Protocol Analysis For Network Intrusion Detection Holger Dreger, TU Munchen Anja Feldmann, T-Labs / TU Berlin Michael Mai, TU Munchen Vern Paxson, ICSI / LBNL Robin Sommer, ICSI Presented by: Jim Spadaro

NIDS: State-of-the-Art • Protocol-specific traffic analysis • Semantic context for (much) better detection quality • How to decide which protocol to analyze? • Relies on well-known port numbers • (As in, HTTP if-and-only-if TCP port 80) • (or um maybe 8080 and 8000 and ….) • And if it’s not on a well-known port? • Perhaps use byte-level signatures to flag what protocol it appears to be

Problem • Applications use arbitrary ports! • Benign reasons • Lack of user privileges, obfuscation, multiple versions • Adversarial applications (maybe not so benign) • e.g. Skype bypassing firewalls • Malicious intent • Evasion of security monitoring • IRC-botnets on ports other than 666x/tcp • Pirate FTP-servers on ports other than 21/tcp • How to distinguish these?

Structure • Prevalence of the problem • Approach for dynamic analysis in NIDS • Applications of new capabilities • Performance evaluation

Prevalence of the Problem • Data • 24 hour full packet trace from MWN • 3.2 TB of data in 6.3 billion pkts, 137M TCP connections • Successful TCP connections: ~78% • Successful TCP connections on unpriv. Port: ~4% • UCB: University of California, Berkeley, 45,000 • MWN: Munich Scientific Network, 50,000 • LBNL: Lawrence Berkeley National Laboratory, 13,000

Existing NIDS Solutions • None known to fully address the problem • Bro, Snort, Dragon, and Intrushield all rely on port-based protocol analysis • Some can use signatures to detect inappropriate protocol use • Such detection is helpful, but has drawbacks • Does not distinguish benign off-port traffic from malicious: • Can only stop BitTorrent completely, not detect for illegal file sharing • Can only turn off off-port IRC completely, not detect botnets

Protocol Detection - Alternatives • Statistical approach • E.g., packet size distribution • Suitable for separating interactive/bulk traffic • e.g., distinguish chat from file transfers • Detect protocol patterns • Signatures (already implemented) • Relatively easy to implement: most NIDS have signature-matching infrastructure • e.g., Linux netfilter l7-filter • Very general signatures, not completely accurate • Maybe: Protocol detection by plausibility heuristics

Protocol Detection: Signatures • Most (but not all) successful connections trigger expected signature • FTP: high percentage of false negatives ~ 21.7% • “Other port” matches: needs further investigation

Protocol Signatures:Well-known Ports • Some connections trigger more than one signature • Signature too general • Some misappropriate use of well-known ports

Observations • Imprecision of signatures: • False negatives highlight need for refined signatures and/or more context • False positives (e.g., multiple matches for single connection) highlight limits in discriminating power • Certain protocols are difficult to make signatures for • Telnet: many legitimate initial byte patterns • Problem is real: • If we just believe port numbers, numerous misidentifications

Goals • Detection Scheme Independent • Currently predominantly use signatures • However, flexibility is maintained to allow other approaches, like heuristics • Dynamic Analysis • Some protocol detection schemes need more data than others • Analyzers should be disabled upon detecting a false positive • Modularity • Eases dealing with multiple network substacks • IP-within-IP tunnels • Efficiency • Improvements must retain performance • Customizability • Result must easily adapt to specific needs

Approach for Dynamic Analysis • Dynamic data path enhances flexibility and accuracy • Example: A packet is received on port 80/tcp, but really carries data for an IRC session • A traditional NIDS will still examine the packet as HTTP • Dynamic analysis can change the analysis to IRC even though the analysis was initialized for HTTP • Approach uses a PIA • Protocol Identification Analyzer

Dynamic Data Path • How can this be done? • Associate each connection with a tree structure • Each node represents an analyzer • Links represent data channels, with parent node’s output channels connecting to childrens’ input channels • The PIA instantiates the initial analyzers • Each analyzer can insert or remove other analyzers on its input and output channels • Thus, each analyzer can add additional analyzers if it needs the support of additional functionality • If the analyzer cannot determine which analyzer is needed, another PIA can be instantiated • An analyzer that cannot analyze the data it is being given can remove its subtree from the tree • Allows siblings on the tree to be run in parallel

Analyzer Tree Example • Example for an analyzer tree for an email connection: • The IP Analyzer determines the connection is TCP • The TCP Analyzer determines the connection looks like email • Analyzers for SMTP, POP, and IMAP are instantiated to analyze the data • Any analyzers that determine that they cannot analyze the data can remove themselves

Technical Issues • Byte Streams vs Packet Streams • Protocols over TCP vs Other • Resolved by having both input channels for each analyzer • Starting an analyzer mid-connection • Resolved by buffering the start of each stream (Default 4KB)

Implementation • Implemented in Bro NIDS • New “Protocol Identification Analyzer” (PIA) implements protocol-detection and buffering • Stock Bro has modular design suited to implementing the PIA • Required changing Bro’s notion of one-to-one static binding from transport analyzer to application analyzer(s) • Running in three large environments: • MWN, UCB, and LBNL

Implementation • PIA examines the first few KB of each connection for efficiency • Shown to be sufficient for protocol detection • Can activate analyzers in four ways: • Signatures • Connection port • Each analyzer can register a detection function • Allows arbitrary heuristics • Using a prediction table

Deployment Trade-Offs • Protocol detection signatures • Loose signatures affordable • false positives fixed later • But too lose means slower • Analyzer is more expensive than pattern-matching • Improve accuracy with bidirectional signatures • Server must respond with the same protocol • Prevents attacker from intentionally triggering slow analyzers

Deployment Trade-Offs • At what point should an analyzer remove itself? • Real-world traffic is not perfect • Implementations can stretch protocol bounds • Should not parse the whole stream • Defeats the purpose of protocol analysis • Resolution: Analyzer should never disable itself • Generate Bro events on protocol violations • Allow user-level policy script to disable analyzer if necessary • E.g., after a certain number of violations

New Capabilities • In summary, can now: • Detect connections on non-standard ports reliably • Includes protocols that use others as transport • IE, distinguish Kazaa, BitTorrent, SOAP, etc over HTTP • Inspect payload of FTP transfers • Detect IRC-based bots • This has successfully worked in the field

Reliable Real-Time Protocol Detection on non-Standard Ports • 1 day at UC Berkeley (MWN similar) • Connections on non-standard ports mainly HTTP • UCB: Split between real HTTP (e.g., Apache) and Gnutella • MWN: Similar, but more P2P (BitTorrent), also some FTP • Open HTTP proxies detected and closed • Open SMTP relay detected and closed

Payload Inspection of FTP Data Transfers • FTP data transfers use arbitrary ports • Identify based on prior PORT, PASV • Dynamically added to prediction table • Check connection payload use libmagic • Actual file type == expected file type? • E.g, could find rootkit tarball sent in .jpg • Determined using file analyzer • Extension: • Use same mechanism for SMTP (mail attachments)

Detecting IRC Based Botnets • Idea • Botnet communication often uses IRC • Botnet detector on top of IRC analyzer • Check nicknames • Check channel names • Check contact to identified bot-servers • Key consideration: must analyze IRC dialog seen off-port • Because lots of benign IRC runs off-port too… • > 100 bots found at MWN+UCB • MWN employs auto-blocking based on detector • Not as adept at detecting custom protocols

Performance Evaluation

Performance • New framework does not add significant additional overhead • Performance cost is about 13.8% between PIA-Bro-M4K and Stock-Bro • Protocol detection (signature matching on all packets) expensive but doable) • Solutions: • Specialized hardware • Load balancing possible

Summary • Network traffic resists classification by port • General framework for dynamic protocol analysis • Use signatures to pre-filter for efficiency • Use application parsing to make high-quality decisions • Accurate enough for auto-blocking of bots at large-scale network • Plus detection of illicit relays and servers • Integrated into Development Release 1.2 of Bro

Questions?

Dynamic Application-Layer Protocol Analysis

Dynamic Application-Layer Protocol Analysis

Presentation Transcript

application layer

Application Layer

application layer

Application Layer

Protocol Layer

Application Layer Protocol Negotiation

Application layer

Application Layer

Application Layer

Application Layer

Application Layer

Application layer

Application Layer

Application Layer

Application Layer

Application layer

Application Layer

APPLICATION LAYER

APPLICATION LAYER PROTOCOL