130 likes | 218 Vues
Explore GT architecture for precise internet traffic analysis and accurate DPI with application association. Study on completeness and accuracy metrics in experimental analysis. Enhance accuracy of DPI by up to 85%.
E N D
GT: Picking up the Truth from the Ground for Internet Traffic Francesco Gringoli, Luca Salgarelli, Niccolo' Cascarano, Fulvio Risso, and Kimberly C. Claffy SIGCOMM Comput. Commun. Rev. 2009 Networking Journal Club 28th May 2010
Outline • Introduction and Related Work • The GT Architecture • Testbed Setup and Experimental Analysis • Design choices • Conclusions
Introduction and Related Work • Motivation: • Traffic modeling, intrusion detection… need traces where application (and protocol) is associated with each packet or flow. • Current Approaches: • Manual generation • Problems: bias (lack of human behavior), background applications • DPI • Problems: encrypted traffic, ambiguity (different protocols, similar signatures), port-based is obsolete
Introduction and Related Work • Current Approaches: • Application stamping • Problems: real time, packet size close to the MTU • BLINC • Problems: accuracy 30% • Proposed solution: • GT: • By monitoring host’s kernel • Associates each packet (flow) with the name of its controlling application
GT Architecture • Four parts: • Client daemon • Packet capture engine • Database server • IPClass
GT Architecture: client daemon • Functionality: • To track changes in active network sockets, and collect and transmit to the database server relevant information about the application that own the sockets. • In user-space (mirrors active socket list handled by the kernel), a thread loop periodically synchronizes with kernel tables (configurable frequency) • Currently, compiles and runs on many platforms (Windows Vista/XP/2003, Linux 2.4 and 2.6, Mac OS X 10.4 and 10.5, Free BSD 5 and 6)
GT Architecture: packet capture engine and database server • Packet capture engine: tcpdump • Database server: MySQL • Each entry: • 5-tuple • Log time • Name of the application • Type of log event (create, destroy,…)
GT Architecture: IPClass tool • IPClass reconciles information contained in the data base with the captured traffic traces. • For each packet of a flow (with timestamp t_0), IPClass looks in the database for a flow with log time close to t_0. • If found, the entry will unequivocally indentify the application that generated the flow • Associating protocols to flows: • Inspecting each application and compiling a list of protocols used by the application itself (l7filter)
Testbed setup • Campus network environments (upstream): • UNIBS: 6 days, 18 GB, tcpdump 2.4GHz QuadCore (<100 Mbps) • POLITO: 3 days, 200 GB, Endace card, (>100 Mbps) • NTP to maintain synchronization
Experimental Analysis • Two metrics: completeness and accuracy • Completeness: • Relevant parameter: polling time • Too short -> unnecessary overhead • Too long -> missing flows • Polling time vs. % CPU: • 4 s -> CPU<5% • 1 s -> CPU~5% • 125 ms -> CPU 20-50% • (depending on Operating System)
Experimental Analysis: Completeness • Tagged flows/bytes: • 99% TCP bytes • 60-80% TCP sessions (90% for Mac OS) • >87% UDP sessions (~100% for Linux) • Flows not tagged -> flows very short -> looks for other (unique) tagged flow with same 5-tuple • -> 99% flows
Experimental Analysis: Accuracy • GT refers only application names, but, normally, exists a relation between applications and protocols. • GT can improve accuracy of traffic analyzer (e.g. DPI): • Only use signatures “relevant” to each flow • GT+DPI improves upto 41% (of bytes) and 21% (of flows).
Design Choices and Conclusions • Design Choices: • Centralized vs. Distributed • User-space vs. Kernel-space • Conclusions: • Implementation of an open source toolset: • GT assigns application labels to traffic flows, allowing storage of ground truth with the trace itself • 99% of bytes and 95% of flows without affecting CPU load • Improve the accuracy of DPI up to 85% (UDP Skype) and upt to 91% (P2P applications).