1 / 101

Towards Topology-aware Network Services

Towards Topology-aware Network Services. Jia Wang AT&T Labs – Research http://www.research.att.com/~jiawang/ October 5, 2014. Topology-aware network services. Internet. Server. User. Outline. Popular services Network-aware clustering Methodology Statistical model of clusters

Télécharger la présentation

Towards Topology-aware Network Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Towards Topology-aware Network Services Jia Wang AT&T Labs – Research http://www.research.att.com/~jiawang/ October 5, 2014

  2. Topology-aware network services Internet Server User

  3. Outline • Popular services • Network-aware clustering • Methodology • Statistical model of clusters • Applications • Traffic engineering - route popularity vs stability • Topology modeling - cluster graph • CDN - proximity routing • P2P

  4. Peer-to-peer (P2P) networks • P2P systems (e.g. Napster, Gnutella, Freenet, KaZaA) • New model of content distribution • End-hosts self-organize and share content among each other • Files are replicated among the application layer overlay network on demand • Significant traffic contribution • P2P file sharing is #1 source of traffic • Large and growing user base • 100+ Million downloads of KaZaA client

  5. Opportunity and challenge • Scale well in terms of the hardware, bandwidth… • For a wide deployment of P2P applications • Need to understand P2P traffic • Need a scalable content location and routing scheme in the application layer • Implications for Security

  6. Popular P2P systems • Proprietary: e.g. FastTrack • Closed/encrypted protocol • Open Source: e.g. Gnutella • Heterogeneous implementation • Specification not well defined • Legacy feature support

  7. Overview: P2P System Search Network B A

  8. Overview: P2P System Search: “Starwars.divx” Search Network B A

  9. Overview: P2P System Search Network B A Response: ”B has Starwars.divx”

  10. Overview: P2P System Search Network B A Get: “Starwars.divx”

  11. Overview: P2P System Search Network B A Resp: “Starwars.divx”

  12. FastTrack Proprietary Protocol Homogeneous Network “2+ Million” Users “2+ PB” of data Supernode Search Network Gnutella Open Protocol Heterogeneous Network ~150 Thousand(?) ~500+GB Partial Ultrapeer Search Network FastTrack vs. Gnutella

  13. Gnutella Overview X B A

  14. Gnutella Overview X B A Search: “Starwars.divx”

  15. Gnutella Overview X B A Search: “Starwars.divx”

  16. Gnutella Overview X B A Search: “Starwars.divx”

  17. Gnutella Overview X B A Search: “Starwars.divx”

  18. Gnutella Overview X B A Resp: B has “Starwars.divx”

  19. Gnutella Overview X B A Resp: B has “Starwars.divx”

  20. Gnutella Overview X B A Resp: B has “Starwars.divx”

  21. Gnutella Overview X B A Resp: B has “Starwars.divx”

  22. Gnutella Overview X B A Get: “Starwars.divx”

  23. Gnutella Overview X B A Resp: “Starwars.divx”

  24. Popularity of Gnutella Implementations

  25. P2P traffic analysis • Data • Flow level records from IGRs at a large ISP [SIGCOMM-IMW2002-A] • Both signaling traffic and actual data downloading • Comparable to Web traffic volume • Gnutella traces: signaling traffic [SIGCOMM-IMW2001]

  26. P2P traffic across large network • Methodology challenges • Decentralized system • Transient peer membership • Some popular close proprietary protocols • Large-scale passive measurement • Flow-level data from routers across a large tier-1 ISP backbone • Analyze both signaling and data fetching traffic • 3 levels of granularity: IP, Prefix, AS • P2P protocols • FastTrack:1214 (including Morpheus) • Gnutella:6346/6347 • DirectConnect:411/412

  27. Methodology Discussion • Advantages • Requires minimal knowledge of P2P protocols: port number • Large scale non-intrusive measurement • More complete view of P2P traffic • Allows localized analysis • Limitations • Flow-level data: no application-level details • Incomplete traffic flows • Other issues • DHCP, NAT, proxy • Host  IP • Asymmetric IP routing

  28. Measurements • Characterization • Overlay network topology • Traffic distribution • Dynamic behavior • Metrics • Host distribution • Host connectivity • Traffic volume • Mean bandwidth usage • Traffic pattern over time • Connection duration and on-time

  29. Data cleaning • Invalid IPs • • • • No matched prefixes in routing tables • Invalid AS numbers • > 64512 • Removed 4% flows

  30. Overview of P2P traffic • Total 800 million flow records • FastTrack is the most popular one

  31. Host distribution

  32. Host connectivity FastTrack (9/14/2001) Connectivity is very small for most hosts, very high for few hosts Distribution is less skewed at prefix and AS levels

  33. Traffic volume distribution FastTrack (9/14/2001) • Significant skews in traffic volume across granularities • Few entities source most of the traffic • Few entities receive most of the traffic

  34. Mean bandwidth usage FastTrack (9/14/2001) • Upstream usage < downstream usage. Possible causes are • Asymmetric available BW, e.g., DSL, cable • Users/ISPs rate-limiting upstream data transfers

  35. Time of day effect FastTrack (9/14/2001 GMT) • Traffic volume exhibits very strong time-of-day effect • Milder time-of-day variation for # hosts in the system

  36. Host connection duration & on-time FastTrack (9/14/2001) thd=30min • Substantial transience: most hosts stay in the system for a short time • Distribution less skewed at the prefix and AS levels • Using per-cluster or per-AS indexing/caching nodes may help

  37. Traffic characterization • The power law • May not be a suitable model for P2P traffic • Relationship between metrics • Traffic volume • Number of IPs • On-time • Mean bandwidth usage

  38. Traffic volume vs. on-time FastTrack (9/14/2001): top 1% hosts (73% volume) 1 2 Volume heavy hitters tend to have long on-times Hosts with short on-times contribute small traffic volumes

  39. Connectivity vs. on-time FastTrack (9/14/2001): top 1% hosts (73% volume) 1 2 Hosts with high connectivity have long on-times Hosts with short on-times communicate with few other hosts

  40. P2P vs Web • Observations • 97% of prefixes contributing P2P traffic also contribute Web traffic • Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web traffic • Prefix stability – the daily traffic volume (in %) from the prefix does not change over days • Experiments: 0.01%, 0.1%, 1%, 10% heavy hitters => 10%, 30%, 50%, 90% of the traffic volume

  41. Traffic stability March 2002 Top 0.01% prefixes Top 1% prefixes P2P traffic contributed by the top heavy hitter prefixes is more stable than either Web or total traffic

  42. P2P traffic classification • Graph transformation • AS relationship • Flow size

  43. Background • Inter-AS relationship • Provider-customer: customer pays provider • Peer-peer: mutually benefit by exchanging traffic between respective customers • BGP export rules • An AS can export its routes and routes to its customers to its provides/peers, but can not export routes learnt from other providers/peers. • An AS can exports its routes, routes of its customers and routes learnt from other providers/peers to its customers.

  44. AS paths AS categories ISP ISP-CUST ISP-PEER ISP-CUST-CUST* ISP-PEER-CUST* ISP-MH-CUST* UNKNOWN Methodology provider provider customer customer provider provider customer customer peer provider provider customer peer customer provider provider customer customer

  45. Traffic classification P2P traces Netflow AS relationship BGP configuration ISP ISP-PEER ISP-CUST P2P traffic flows ISP-CUST-CUST* ISP-PEER-CUST* ISP-MH-CUST* Graph representation Threshold Connected Components Signaling traffic Data traffic Traffic classification

  46. Traffic data • P2P applications: DirectConnent, FastTrack, Gnutella • Flow records from routers across a large IP network • Control traffic: flow size <= threshold • Data traffic: flow size > threshold • Threshold = 4KB • Experiments: 3 weeks, each 5-7 days, 800 million flows in total from AT&T IP backbone

  47. Classification results • Gnutella heaviest connected component • 99% IPs, 99% bytes • Signaling traffic: 90% IPs, 95% flows, 0.4% bytes • Data traffic: 50% IPs, 5% flows, 99.6% bytes • Signaling traffic is much less skewed than data traffic • Signaling: top 1% IPs, 25% bytes • Data: top 0.1% IPs, 30% bytes

  48. Traffic direction 5% of the traffic are intra-AT&T (ATT and ATT-CUST).

  49. Why topology-aware? • Better traffic engineering • Load balancing • Fault tolerant • Enhance service • Workload • Flash crowds • Denial of Service (DoS) attack • Proximity routing • Lower latency • More bandwidth • Ad tagging • Challenge: An IP address does not inherently contains an indication of location! • Solution: network-aware clustering

  50. Network-aware clustering • Goal: grouping IP addresses based on their location on the Internet • Network-aware cluster [SIGCOMM2000] • Non-overlap • Topologically close • Under common administrative control Clustering requires knowledge that is not available to anyone outside the administrative entities.

More Related