Usefulness of the results - a forgotten evaluation metric of traffic identification tools

Usefulness of the results - a forgotten evaluation metric of traffic identification tools Tomasz Bujlow (tbu@es.aau.dk) Aalborg University

Agenda • Few words about myself • Motivations for traffic monitoring • Existing methods and tools for traffic monitoring & classification - and why they are far from being excellent • A deep look into Deep Packet Inspection • How to verify the accuracy of classification tools? • Implementation and various applications of VBS

Tomasz Bujlow • University of Southern Denmark (2007 – 2009) Bachelor of Computer Engineering, Computer Engineering • The Silesian University of Technology (2003 – 2008) Master of Science in Engineering, Computer Engineering • Aalborg University (2010 – 2014) Doctor of Philosophy (PhD): Classification and analysis of computer network traffic • Universitat Politecnica de Catalunya (January 2013 – April 2013) Visiting PhD Student, CBA - Broadband Communications Research Group • Cisco Certified Network Professional (2010)

Part I Motivations for traffic monitoring

Why to perform traffic monitoring? • To obtain basic statistical information about different kinds of flows in the network and improve Quality of Service Our interests • content (audio, video) • application (P2P, FTP) • service (YouTube, Facebook)

Why to perform traffic monitoring? • To obtain the knowledge which applications are most frequently used in the network and enhance user experience by tuning some network parameters or setting up dedicated proxies or servers Our interests • application (Skype, HTTP)

Why to perform traffic monitoring? • To compare users located in the same network and group them into profiled sections Our interests • application (Skype, BitTorrent) • IP protocol (TCP / UDP)

Why to perform traffic monitoring? • To create graphs of traffic flow between different networks and optimize amounts of bandwidth bought from different content providers Our interests • service (YouTube, Facebook)

Why to perform traffic monitoring? • To introduce smart logging of traffic. Logging is now required by law. The ability to recognize types of transmitted content can result in registering of only, for example, text content of websites, but not images or downloaded binary files. That will save resources, especially storage space. Our interests • content (text, audio, video)

Why to perform traffic monitoring? • To create a traffic generator, to imitate traffic generated by particular applications, or to imitate the real traffic in the network. That allows to test different solutions before implementing them in the real network, and therefore, to minimize the cost. Our interests • IP protocol (TCP / UDP) • application (HTTP, BitTorrent, Skype) • content (audio, video) • service (YouTube, Facebook)

Why to perform traffic monitoring? • To obtain precise data needed to create fast and accurate traffic classifiers working in the network core, which are based on statistical informations (Machine Learning Algorithms). Our interests • IP protocol (TCP / UDP) • application (HTTP, BitTorrent, Skype) • content (audio, video) • service (YouTube, Facebook)

Why to perform traffic monitoring? • To implement smart assessment of QoS in the network at the users' level and in the core of the network Our interests • IP protocol (TCP / UDP) • application (HTTP, BitTorrent, Skype) • content (audio, video) • service (YouTube, Facebook)

Why to perform traffic monitoring? • To understand the behavior of different applications, services, ... Web browsing YouTube World of Warcraft Our interests • IP protocol (TCP / UDP) • application (HTTP, BitTorrent, Skype) • content (audio, video) • service (YouTube, Facebook) Skype

Why to perform traffic monitoring? • To detect malicious traffic, such as Botnet traffic Our interests • application (bot?)

Why to perform traffic monitoring? • To detect malicious traffic, such as DDoS attacks

Part II Existing methods and tools for traffic monitoring & classification - and why they are far from being excellent

Traffic classification – overview • Classification by ports • Deep Packet Inspection (DPI) • QoS based (IP precedence, DSCP) • Statistical classification

Port-based classification • Very simple idea, widely used by network administrators to limit unwanted traffic (generated by worms, spam, etc.) • Implemented on almost all layer-3 switches existing on the market • Can classify only applications operating on fixed ports numbers • Very easy to cheat, so unreliable What can we get? • low application-layer protocol (HTTP, POP3) for some old, well-known cases

Deep Packet Inspection (DPI) • Rely on inspecting the payload on the application layer • Much more convenient to use than previously described methods • Requires significant amounts of resources • Numerous privacy and confidentiality issues • Encryption makes DPI more difficult • False positives and false negatives due to implemented statistical methods in DPI tools What can we get? • Everything we want: IP protocol, application, content, service, etc • But what kinds of results are really produced by the existing DPI tools?

DPI – is it really a consistent mean? • Ipoque PACE – application level, content container level [FLASH, WINDOWSMEDIA, QUICKTIME] • OpenDPI – an open-source fork of PACE, the same level of consistency • nDPI – successor of OpenDPI, additionally: service provider level [FACEBOOK, GOOGLE, TWITTER] • Libprotoident – L4 level [TCP / UDP] + the application [BitTorrent], content [Flash_Player], or service provider [YahooError] • NBAR – consistent output on the application level • L7-filter - consistent output on the application level Today accuracy != consistency • accurate tools (PACE, OpenDPI, nDPI) – inconsistent • consistent tools (NBAR, L7-filter) - inaccurate

DPI – results by PACE, OpenDPI, nDPI • applications and application protocols: BITTORRENT, RDP, SMB, NTP, SSH, DNS, PANDO, NETBIOS, EDONKEY, SOPCAST ,DIRECT_DOWNLOAD_LINK, FTP, ICMP, QUICKTIME, MAIL_SMTP, MAIL_IMAP, WINDOWSMEDIA, MAIL_POP, PPSTREAM, STUN, STEAM • low-level application protocol: HTTP, SSL • content: FLASH, MPEG • Undetected traffic: UNKNOWN • nDPI adds few services, as Facebook, YouTube, and Google

DPI - effects of the consistency aspect • Even if the classification results are consistent on the application level, other levels are unknown (IP protocol, lower application protocol, content, service). So, the usefulness of such results is very limited. However, they can be used for the accounting purposes on the application level. • Mixing the levels of the results makes the things even worse: a) it is not possible to account the traffic on any level, as always one chosen level is given and the rest is unknown b) as only one level is given, we do not know what is on any other level, so the usefulness of such results in almost NONE! Today accuracy != consistency • accurate tools (PACE, OpenDPI, nDPI) – inconsistent • consistent tools (NBAR, L7-filter) - inaccurate

DPI - reasons for the lack of consistency • Most developers claim that “their tool provides the most detailed result, on whatever level it is” • However, how to assess, which level is more precise? Content (MP4 video), content container (Flash), service (YouTube), or application protocol (HTTP)? • Given that the obtain result is Flash, what is the real flow association? a) TCP → HTTP → Flash → MP4 video → YouTube (regular file download)? b) TCP → RTMP → Flash → Justin.tv (live TV streaming)? c) TCP → FTP → Flash → EXE (executable file inside Flash container transferred by FTP)? Today accuracy != consistency • accurate tools (PACE, OpenDPI, nDPI) – inconsistent • consistent tools (NBAR, L7-filter) - inaccurate

DPI – how to generate useful results? • Structure the results, so all the relevant classification levels are evaluated: a) IP protocol level (TCP / UDP) b) lower application-level protocol, as HTTP, SSL, POP3, etc c) higher application-level protocol or application, as SMTPS, Skype, BitTorrent, Dropbox d) content, as MP4 video, FLV video, MP3 audio, JPG image e) service, as Facebook, YouTube, or Google • Implemented by: a) new version of PACE (partly and in a very limited manner) b) new, development version of nDPI (full implementation)

DPI – results generated by new PACE BitTorrent:plain:not_detected RDP:no_subprotocols:not_detected unknown:no_subprotocols:not_yet_detected BitTorrent:uTP:not_detected SMB/CIFS:no_subprotocols:not_detected SSH:no_subprotocols:not_detected HTTP:generic:not_detected BitTorrent:encrypted:not_detected DNS:no_subprotocols:not_detected Pando:no_subprotocols:not_detected NETBIOS:no_subprotocols:not_detected Yahoo:webmail:not_detected eDonkey:plain:not_detected HTTP:generic:facebook SSL:generic:not_detected HTTP:generic:not_yet_detected HTTP:generic:youtube HTTP:generic:youtube Socks:socksv5:not_yet_detected PPLIVE:no_subprotocols:not_detected Skype:unknown:not_detected PPSTREAM:no_subprotocols:not_detected Google:encrypted:not_detected unknown:no_subprotocols:not_detected HTTP:media:not_detected FLASH:no_subprotocols:not_detected

DPI – results generated by nDPI-ng • proto: TCP->SSL_with_certificate->POP3S, service: Google → encrypted POP3 session with a Google mail server • proto: TCP->SSL_with_certificate, service: Twitter" → encrypted connection to a Twitter server • proto: TCP->FTP_Data, content: JPG → file-transfer FTP session, which carries a JPG image • proto: TCP->SSL_with_certificate->Dropbox, service: Dropbox → encrypted Dropbox session (the application is Dropbox) with the Dropbox server • proto: TCP->SSL_with_certificate, service: Dropbox → encrypted session with a Dropbox server, while the application is unknown (it can be a web browser connection) • proto: TCP->HTTP, content: WebM, service: YouTube → a flow from YouTube, which transports WebM movie • proto: UDP->DNS, service: Facebook → DNS query abouta hostname belonging to Facebook

Using QoS markers • Class of service (CoS): 3-bit field that is present in an Ethernet frame header when 802.1Q VLAN tagging is present • Very easy to cheat – everyone can set it to any value • Most Internet Service Providers do not trust incoming QoS markings from their customers What can we get? • Nothing more than previously set by a user or an application

Using QoS markers • IP packets contain the Type of Service field, which can be used for layer-3 QoS marking What can we get? • Nothing more than previously set by a user or an application – limited to trusted devices in the network

Using QoS markers • Valid values for IP Precedence: 0 - 7 • Valid values for DSCP: 0 – 63

Statistical classification • Based on rules, which can be written manually (slow and inefficient) or derived automatically by the use of Machine Learning Algorithms (MLAs) • Very broad choice of MLAs: K-Nearest Neighbors, K-Means, Naive Bayes Filter, C4.5, J48, Random Forest, etc • Achievable detection rate is over 95% • MLAs require significant amount of good quality training data • But... the speed is the power! What can we get? • application • content (indirectly) • service (indirectly)

So how can we use the statistical methods? What can we use to classify the traffic by the statistical methods? • IP protocol level → Type field from the IP packets • Application level → statistical classification by packet sizes, ports, TCP flags, flow durations, etc • Content level → statistical classification by IP addresses • Service provider level → statistical classification by IP addresses What is the real result? • Pretty good accuracy for the cases, which were trained by MLA • Poor accuracy for all the other cases

Identification of service providers • Monitoring of DNS replies delivers the required information • Problems: many service providers using the same IP address “tcpdump -v -K -n -N -t -i eth0 udp src port 53” IP (tos 0x0, ttl 46, id 30600, offset 0, flags [none], proto UDP (17), length 102) 8.8.8.8.53 > 172.26.10.88.58238: 33261 2/0/0 www.facebook.com. CNAME star.c10r.facebook.com., star.c10r.facebook.com. A 31.13.72.17 (74) IP (tos 0x0, ttl 46, id 26945, offset 0, flags [none], proto UDP (17), length 181) 8.8.8.8.53 > 172.26.10.88.46207: 10707 4/0/0 fbstatic-a.akamaihd.net. CNAME fbstatic-a.akamaihd.net.edgesuite.net., fbstatic-a.akamaihd.net.edgesuite.net. CNAME a1168.dsw4.akamai.net., a1168.dsw4.akamai.net. A 95.101.2.73, a1168.dsw4.akamai.net. A 95.101.2.91 (153)

Part III A deep look intoDeep Packet Inspection

How much information is needed? • It depends on the specific DPI tool • Libprotoident requires only 4 bytes of packet payload in each direction to recognize the traffic. The price: only IP protocol and application levels can be determined. • Other tools also process following bytes, looking for specific signatures of a content or a service. • Some signatures can identify the traffic after receiving 1 first packet with payload (as DNS, NTP, or BitTorrent). Finding the web service or content in an HTTP flow usually requires 4 first packets. • The most 10 packets in each direction should besufficient to determine all the flow characteristics.

Which information is used by DPI? Libprotoident: comparison of the first 4 Bytes of payload + of the packet lengths + port numbers if (!match_str_either(data, "\x01\x00\x00\x00")) return false; if (!match_chars_either(data, 0x00, 0x00, 0x00, ANY)) return false; if (data->payload_len[0] == 4 && data->payload_len[1] == 1) return true; if (data->server_port != 53 && data->client_port != 53) return false;

Which information is used by DPI? In PACE / OpenDPI / nDPI, there are the same checks: if ((payload_len > 0) && match_first_bytes(packet->payload, "\xe9\x03\x41\x01")) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, "Found PPLIVE.\n"); if ((payload_len == 0) || ((payload_len == 2) && (packet->payload[0] == 0x05) && (packet->payload[1] == 0x00))) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, "Found SOCKS5.\n"); if ((payload_len == 0) || (payload_len == 49) ||(payload_len == 94)) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, "Found PPLIVE.\n"); if ((packet->udp->dest == htons(5041) || packet->udp->source == htons(5041)) NDPI_LOG(0, ndpi_struct, 0 "Possible PPLIVE ...\n");

Which information is used by DPI? • But they are done for each packet separately (in the order how the packets arrive), so we do not have access to the payload of the previous packet • The detection status is kept in state variables associated with the particular flow

Which information is used by DPI? However, they use a bunch of other methods, as IP check: /* Apple (FaceTime, iMessage,...) 17.0.0.0/8 */ if(((saddr & 0xFF000000 /* 255.0.0.0 */) == 0x11000000 /* 17.0.0.0 */) || ((daddr & 0xFF000000 /* 255.0.0.0 */) == 0x11000000 /* 17.0.0.0 */)) { flow->ndpi_result_service = NDPI_RESULT_SERVICE_APPLE; }

Which information is used by DPI? Or TCP flags: if (packet->tcp->psh != 0 && flow->rtmp_bytes == 1537) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG, Or even the number of processed packets: if (flow->packet_counter > 20) NDPI_LOG(0, ndpi_struct, NDPI_LOG_DEBUG.....

Which information is used by DPI? • In order to discover web services and types of HTTP content, nDPI parses HTTP headers to discover the “host” and “content-type” lines. • The “host” field is compared against domain names associated with the particular service, as: "amazon.com" -> NDPI_RESULT_SERVICE_AMAZON "amazonaws.com" -> NDPI_RESULT_SERVICE_AMAZON "amazon-adsystem.com" -> NDPI_RESULT_SERVICE_AMAZON ".apple.com" -> NDPI_RESULT_SERVICE_APPLE ".mzstatic.com" -> NDPI_RESULT_SERVICE_APPLE

Which information is used by DPI? • The “content-type” field is compared against predefined values associated with the particular types of the content: "video/mp4" -> NDPI_RESULT_CONTENT_MPEG "video/mpeg" -> NDPI_RESULT_CONTENT_MPEG "video/nsv" -> NDPI_RESULT_CONTENT_MPEG "misc/ultravox" -> NDPI_RESULT_CONTENT_MPEG "audio/ogg" -> NDPI_RESULT_CONTENT_OGG "video/ogg" -> NDPI_RESULT_CONTENT_OGG

How to deal with the encrypted traffic? • Encrypted web traffic increased from 20% (in 2011) to 45% (2014) from the whole web traffic • The content is always unknown • The application protocol (HTTPS, POPS, SMTPS, etc) discovered based on ports (e.g., port 465 = HTTPS) • The service discovered based on: a) inspection of the server field in certificates (nDPI) b) matching with services based on cached DNS replies (TSTAT)

Part IV How to verify the accuracy of classification tools?

The origin of the reference data • The reference data (ground-truth) are usually obtained in one of previously described ways, what causes incompleteness and high misclassification rate • Publicly available databases contain very often incomplete and inaccurate data • So how to provide good quality data?

Monitoring on the user's level • System sockets provide name of the application associated with each particular stream in the network • Ability to split HTTP streams according to their content • Fast, precise, avoid privacy issues • Avoid unreliability of port-based or statistical tools

Volunteer-Based System (VBS) • Collects data from clients • Enhanced privacy • Application names are taken from system sockets • Recognizes different types of HTTP contents • Open-source, GPL licensed • Windows (32/64-bit) and Linux • Can be downloaded free of charge from SourceForge: http://vbsi.sourceforge.net

Design of the system • Volunteer-Based System consists of clients installed on users' computers, the server located at Aalborg University, and statistics generators. Each part of the software can be developed independently, and it collaborates with other by the use of database SQL interfaces

The concept of a flow Remote end-point: IP address, port The way of transport: transportprotocol Local end-point: IP address, port

Information logged for each flow • Identifier of the client • Start timestamp • Hashed local, global, and remote IP addresses • Local and remote ports • Transport layer protocol • Name of the application • Name of the network

Information logged for each packet • Identifier of the flow • Direction (inbound / outbound) • Size • State of all TCP flags (for TCP flows) • Time elapsed from the previous packet in the flow • Type of the content (for HTTP flows)

Usefulness of the results - a forgotten evaluation metric of traffic identification tools

Usefulness of the results - a forgotten evaluation metric of traffic identification tools

Presentation Transcript

PLATON, A set of Tools for the Interpretation of Structural Results

Evaluation of Biometric Identification Systems

Usefulness of BMD Testing

The Usefulness of “Dark Humor ”

Statistical Identification of Encrypted Web-Browsing Traffic

Evaluation Of The Locations Of Kentucky’s Traffic Crash Data

TOOLS IDENTIFICATION

The Remains of the Forgotten

Usefulness of Smartphone as educational tools

Nompilo Study: Results of Evaluation

WHEN THE RESULTS OF EVALUATION CAN MAKE A DIFFERENCE…

Evaluation of “data” grid tools

Machine Learning for Identification of P2P Traffic

Evaluation of a Novel Two-Step Server Selection Metric

Increasing the Usefulness of a Mesocyclone Climatology

Evaluation of Image Retrieval Results

Traffic Sign Identification

The Usefulness of Acupuncture

Major tools of Evaluation

Usefulness of Explanations

The Usefulness of Preemption, Traffic Preemption Emitter

The Usefulness of Shrikhand – Explained