1 / 22

RelSamp : Preserving Application Structure in Sampled Flow Measurements

RelSamp : Preserving Application Structure in Sampled Flow Measurements. Myungjin Lee , Mohammad Hajjat , Ramana Rao Kompella , Sanjay Rao. A plethora of Internet applications. 1) Emergence of new applications. Objectives Re-provision networks Detect undesirable behaviors of applications

skah
Télécharger la présentation

RelSamp : Preserving Application Structure in Sampled Flow Measurements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RelSamp:Preserving Application Structurein Sampled Flow Measurements Myungjin Lee, Mohammad Hajjat, RamanaRaoKompella, Sanjay Rao

  2. A plethora of Internet applications 1) Emergence of new applications • Objectives • Re-provision networks • Detect undesirable behaviors of applications • Prepare network better against major application trends 3) Characterization 2) Measure/Monitor Internet

  3. Monitoring applications at an edge • Goal: Monitoring application behavior • Identify number of flows • Identify number of packets • Current Solution: Sampled NetFlow • Supported by most modern routers • Key limitation: Application session structure gets distorted • Small # of flows per application session • Small # of packets per application session Internet Edge Router Enterprise Network Sampled NetFlow

  4. Preserving application structure in flow measurements • Benefit 1: Enables continuous monitoring of applications • Better understanding about communication patterns • Better understanding of characteristics (# of flows, packets) • Benefit 2: Application classification becomes easier • Statistical machine learning techniques: SVM, C4.5, etc. • Social behavior-based classifier: BLINC • Benefit 3: Detecting undesirable traffic patterns of an application

  5. Contributions • Introduce the notion of related sampling • Flows belonging to the same application session are sampled with higher probability • Propose RelSamparchitectureforrealizing related sampling • Uses three stages of sampling to preserve application structure • Show efficacy in preserving application structure • Captures more number of flows per application session • Significant increase of accuracy in application classification

  6. Related sampling Original application structure Sampled NetFlow Related sampling App1 Key idea: Sample more flows from fewer application sessions   App2     App3

  7. Realizing related sampling • Question 1: How to sample an application session ? • Question 2: How to sample packets within an application session ?

  8. Defining application session • A sequence of packets from an application on a given host with inter-arrival time ≤τseconds • Packets may belong to different flows to different destinations • Example 1: BitTorrent connections to several destinations within a short span of time constitute an application session • Example 2: Web connections from a browser several seconds apart constitute different application sessions

  9. Sampling an application session • One possible approach: Similar to Sampled NetFlow • Sample packets with some probability • Create an application session record if no record exists • Update the application session record • Problem: Hard to do in an online fashion • No application session identifier (like flow key) • Need to know all flows that constitute an application session • DPI-based techniques are both difficult and incomplete

  10. Our approach: sampling hosts • Observation: Host is a super-set of an application session • Sample more flows from the same host • Flows originating at a same host closely in time typically belong to few application sessions • About 80% hosts run fewer than 2 applications in our study • More details in the paper

  11. RelSamp design • Three-stage sampling process consisting of host, flow, and packet selection stages • Host stage: hash-based sampling • No state maintained on a per-application basis • Many application sessions for a given host are possibly sampled • Change hash function periodically to track different hosts • Flow and packet stages: random packet sampling • Controls fraction of flows sampled in an application session and packets sampled in a flow • Post processing: Can separate flow records into application sessions using port-based/statistical classifiers

  12. RelSamp architecture Ph = selection range / hash space 2 Copy Selection range 1 if ( random no. ≤Pf && no flow record) create a flow record if ( random no. ≤Pp && flow record) update the flow record 1 2 1 H(SrcIP) Hash space Tunable parameters Host-level bias stage Ph Pp Pf Pkt-level bias stage Flow-level bias stage Flow Memory

  13. Exploring parametric space • Router sampling budget Pe = f(Ph, Pf, Pp) • Trade-off between accuracy of flow statistics and # flows/application session • Parameters can be tuned depending on • Objective • Network environment • Examples of tuning parameters by objective • Application classification: low Ph, high Pf, low Pp • Application characterization: lower Ph, high Pf, high Pp • Flow statistics of all flows: Ph = Pf = Pp= Pe

  14. Evaluation goals • Application characterization • Question 1: Is RelSamp effective for sampling more # of flows in an application session? • Question 2: Can RelSamp estimate statistics of an application session? • Application classification • Questions 3: Is sampling more # flows in an application session beneficial for application classification?

  15. Experimental setup • Evaluation of effectiveness for capturing more flows • Trace 1: 1 hour packet trace collected at an edge • RelSampconfiguration (other settings in paper): Capture more flows of app session from many hosts • , , () • Evaluation of application classification accuracy • Trace 2: 13-hour full-payload trace captured at a dorm network • RelSampsetting: Similar setting, but varies from 0.1 to 1.0 • Classifiers: BLINC [SIGCOMM ’05] , SVM, and C4.5 • Ground truth is obtained using DPI-based classifier (tstat)

  16. Flows per application session More # of flows per app session CDF #captured flows/#total flows in an app session

  17. Accuracy of BLINC classifier Accuracy (%) ~ 50% increase Sampling rate Note: classification results on flows using non-standard port

  18. Related work • Flow Sampling [ToN ’06] • Samples flows once flow record is created • Flow Slices [IMC ’05] • Focuses on controlling router resources (CPU and memory) • cSamp [NSDI ’08] • Supports sampling of all traffic by coordinating various vantage points in a network • FlexSample [IMC ’08] • Support monitoring of traffic subpopulations, but needs to maintain extra states for approximate checking of predicates

  19. Summary • Introduced the notion of related sampling • Samples more number of related flows in the same application session with higher probability • Proposed RelSamp architecture • Preserve application structure in sampled flow records • Effective to preserving application session structure • 5-10x more flows per application session compared to Sampled NetFlow • Up to 50% higher classification accuracy than Sampled NetFlow

  20. Thank you! Questions?

  21. Evaluation method of classification techniques Tstat DPI-based Classifier Ground Truth RelSamp Flow Record1 Classification Algorithm (e.g., BLINC, SVM, C4.5) Packet Trace Report Sampled NetFlow Flow Record2 Flow Sampling Flow Record3

  22. Comparison with other solutions using BLINC # of accurately classified flows Sampling rate Note: classification results on flows using non-standard port

More Related