Download
youtube traffic characterization a view from the edge n.
Skip this Video
Loading SlideShow in 5 Seconds..
YouTube Traffic Characterization: A View From the Edge PowerPoint Presentation
Download Presentation
YouTube Traffic Characterization: A View From the Edge

YouTube Traffic Characterization: A View From the Edge

357 Vues Download Presentation
Télécharger la présentation

YouTube Traffic Characterization: A View From the Edge

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹Dept. of Computer Science, University of Calgary, Canada ²Enterprise Systems & Software Lab, HP Labs, USA ³Dept. of Computer Science and Engineering, IIT Delhi, India

  2. Introduction • The way people use the Web is changing. • Creation and sharing of media: • Fast, easy, cheap! • Volume of data associated with extremely popular online media.

  3. What is Web 2.0? • User generated content • Text: Wordpress, Blogspot • Photos: Flickr, Facebook • Video: YouTube, MySpace • Social Networking • Facebook, MySpace • Tagging • Flickr, YouTube

  4. YouTube: Facts and Figures • Founded in February 2005 • Enabled users to easily share movies by converting them to Flash • Largest video sharing Website on the Internet [Alexa2007] • Sold to Google for $1.65 billion in November 2006

  5. How YouTube Works (1/2) GET: /watch?v=wQVEPFzkhaM OK (text/html) GET: /vi/fNaYQ4kM4FE/2.jpg OK (img/jpeg)

  6. How YouTube Works (2/2) GET: swfobject.js OK (application/x-javascript) GET: /p.swf OK (application/shockwave-flash) GET: /get_video?video_id=wQVEPFzkhaM OK (video/flv)

  7. Our Contributions • Efficient measurement framework • One of the first extensive characterizations of Web 2.0 traffic • File properties • File access patterns • Transfer properties • Implications for network and content providers

  8. Outline • Introduction & Background • Contributions • Methodology • Results • Implications • Conclusions

  9. Our View Points • Edge (University Campus) • 28,000 students • 5,300 faculty & staff • /16 address space • 300Mb/s full-duplex network link • Global • Most popular videos

  10. Campus Data Collection • Goals: • Collect data on all campus YouTube usage • Gather data for an extended period of time • Protect user privacy • Challenges: • YouTube’s popularity • Monitor limitations • Volume of campus Internet usage

  11. Our Methodology • Identify servers providing YouTube content • Use bro to summarize each HTTP transaction in real time • Restart bro daily and compress the daily log • Map visitor identifier to a unique ID

  12. Categories of Transactions • Complete – the entire transaction was parsed successfully • Interrupted – TCP connection was reset • Gap – monitor missed a packet • Failure – transaction could not be parsed

  13. Categories of Transactions (2)

  14. Our Traces

  15. HTTP Response Codes

  16. Global Data Collection • Crawling all videos is infeasible • Focus on top 100 most popular videos • Four time frames: daily, weekly, monthly and all time. • 2 step data collection: • Retrieve pages of most popular videos • Use YouTube API to get details on these videos

  17. Outline • Introduction & Background • Contributions • Methodology • Results • Implications • Conclusions

  18. Results • Campus Usage Patterns • File Properties • File Access Patterns • Transfer Properties

  19. Campus Usage Patterns Reading Break

  20. Results • Campus Usage Patterns • File Properties • File Access Patterns • Transfer Properties

  21. Unique File Sizes • Video data is significantly larger than the other content types

  22. Time Since Modification • Videos and images rarely modified • Text and application data modified more frequently

  23. Video Durations • Spike around 3 minutes likely music videos • Campus videos are relatively short: μ=3.3 min

  24. Summary of File Properties • Video content is much larger than other content types • Image and video content is more static than application and text content • Video durations are relatively short Videos viewed on campus tend to be more than 1 month old

  25. Results • Campus Usage Patterns • File Properties • File Access Patterns • Transfer Properties

  26. Relative Popularity of Videos • Video popularity follows a weak Zipf distribution • Possibly due to edge network point of view β = 0.56

  27. Commonality of Videos • ~10% commonality between consecutive days during the week • ~5% commonality between consecutive days on the weekend

  28. Summary of File Referencing • Zipf distribution is weak when observed from the edge of the network • There is some overlap between videos viewed on consecutive days • Significant amount of content viewed on campus is non-unique

  29. Results • Campus Usage Patterns • File Properties • File Access Patterns • Transfer Properties

  30. Transfer Sizes Flash player (p.swf, player2.swf) Javascripts

  31. Transfer Durations • Video transfers have significantly longer durations than other content types

  32. Summary of Transfer Properties • Javascript and flash objects have an impact on the size of files transferred • Video transfers have significantly larger sizes and durations

  33. Outline • Introduction & Background • Contributions • Methodology • Results • Implications • Conclusions

  34. Implications for Network Providers • Web 2.0 poses challenges to caching • Larger multimedia files • More diversity in content • Meta data may be used to improve caching efficiency

  35. Implications for Content Providers • Multimedia content is large! • 65,000 videos/day x 10MB/video = 19.5 TB/month • Long tail effect -> much of the content will be unpopular • Cheap storage solutions • Longer transfer durations for video files • more CPU cycles required for transfers

  36. Conclusions • Multimedia content has much larger transfer sizes and durations than other content types • From the edge of the network, video popularity follows a weak Zipf distribution • Web 2.0 facilitates diversity in content which poses challenges to caching • New approaches are needed to efficiently handle the resource demands of Web 2.0 sites

  37. Questions? Contact psessini@ucalgary.ca