1 / 27

Deconstructing the KaZaA Network

Deconstructing the KaZaA Network. Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki. P2P Impact: Widespread adoption. KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever!

juliet
Télécharger la présentation

Deconstructing the KaZaA Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deconstructing the KaZaA Network Matei Ripeanu joint work with Nathaniel Leibowitz and Adam Wierzbicki

  2. P2P Impact: Widespread adoption • KaZaA – 200 millions downloads (3.5M/week) one of the most popular applications ever! • Number of users for file-sharing applications (www.slyck.com, March’03) • Surveys: 25-30% of all customers at large ISPs use P2P file-sharing systems

  3. P2P Impact (2): Huge traffic • P2P generated traffic now dominates the Internet load • Internet2 traffic statistics • UChicago estimate (March ‘01): Gnutella control traffic about 1% of all Internet traffic. • Cornell.edu (March ’02): 60% P2P

  4. Recent studies Three recent measurement studies on Kazaa traffic: • Are File Swapping Networks Cacheable? Characterizing P2P Traffic, N. Leibowitz, et all, (WCW7 Aug 2002) • Analyzing Peer-to-Peer Traffic Across Large Networks, S. Sen, J. Wang, (IMW, Nov. 2002) • An Analysis of Internet Content Delivery Systems, S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, H. Levy (OSDI, Dec. 2002)

  5. Datacollection • Collect traces at border routers • UWashington, Tier 1 ISP (AT&T?), large Israeli ISP • Identify (and log) Kazaa traffic based on: • port number (1214) • content of HTTP request

  6. Question 1: • What is the overall bandwidth impact?

  7. Bandwidth repartition UW data, June 2002, Source: Saroiu & all. UWashington measurements • Web = 14% of TCP; P2P = 43% of TCP • P2P now dominates Web in bandwidth consumed

  8. Inbound vs. Outbound traffic • UWashington acts like a huge content server: outbound (served) traffic 7.6 times larger than inbound traffic • Residential ISP: the situation is reversed as inbound traffic is more than 5 times larger than outbound

  9. Question 2: • How do the objects shared look like?

  10. File size characteristics • Possible file ranges: • 10KB-100KB pics • 1MB-5MB songs • 10-200MB apps, video clips • > 500MB movies

  11. Question 3 • What is the file popularity distribution? Terminology: • Download session: downloading one chunk of the file in a single HTTP session • Download cycle: a complete download of a file

  12. File popularity distribution • 10% most popular files generate 60% of the download cycles • 1% (or about 3,000) most popular files generate 25% of the download cycles

  13. Question 4: • How is consumed bandwidth use distributed among objects?

  14. Traffic distribution - files • 1% most popular files generate 80% of the traffic • 0.1% most popular files (about 300) generate 50% of the traffic • Compare to UWashington traces where 1% most popular objects responsible for ‘only’ 50% of bytes transferred

  15. Costs … Cost to provide access to the most popular object for a month Assumptions: • OC3 line at $40K/month • 5 day logs extrapolated to one month

  16. Traffic distribution vs. file size • 60 % of the bytes downloaded but only 5% of download cycles correspond to large (movie) files

  17. Question 6: • Content dynamics and caching performance

  18. Content dynamics How many new files does the system sees? per day per hour

  19. Content dynamics (2) How stable is the set of most popular files? About 30% files remain popular over long period of time

  20. Ideal caching performance

  21. Achieved caching performance Significant savings: • File hit rates of 30-35% • Byte hit rates 50-60% • P2P traffic is more cacheable than Web traffic • But, it takes long time to warm-up caches (weeks)

  22. Question 7: • Virtual relationships between users Outliers filtered out

  23. Food web LANL coauthors Film actors Power grid Web Internet Word co-occurrences Small world data-sharing graph Data-sharing graph: • Nodes == Kazaa Users • Link two users that have similar activities (download the same files)

  24. Future questions • What savings can be realized without in caching data but only redirecting requests to local users? • What can one say about the overall characteristics of the network (number of users, number of files, distributions) knowing only data logged by one ISP. Constraint: • Law makers may cause P2P traffic to vanish • However this will lead to a new research question: How will the sudden disappearance of 60% of Internet traffic affect the Internet?

  25. Your questions • Thank you

  26. Goals High-level questions: • What is the impact of these new content delivery systems on the Internet and on ISPs? • What are the characteristics of the Kazaa traffic?

More Related