1 / 31

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System

ACM IMC 2007-10-24. I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System. Meeyoung Cha (Intern at Telefonica Research / KAIST). Why the study of. “ bite-size bits for high-speed munching ” [Wired mag. Mar 2007]

Anita
Télécharger la présentation

I Tube, You Tube, Everybody Tubes… Analyzing the World’s Largest U ser G enerated C ontent Video System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACM IMC 2007-10-24 I Tube, You Tube, Everybody Tubes…Analyzing the World’s Largest User Generated Content Video System Meeyoung Cha (Intern at Telefonica Research / KAIST)

  2. Why the study of “bite-size bits for high-speed munching” [Wired mag. Mar 2007] • Plethora of YouTube clones • UGC is very different How different?

  3. UGC vs. Non-UGC • Massive production scale 15 days in YouTube to produce 120-yr worth of movies in IMDb! • Extreme publishers 1000 uploads over few years vs. 100 movies over 50 years • Short video length 30 sec–5 min vs. 100 min movies in LoveFilm the rest: consumption patterns

  4. Goals and Data • Popularity distribution • Popularity evolution • P2P scalable distribution • Content duplication • Crawled YouTube and other UGC systems metadata: video ID, length, views 1.6M Entertainment, 250KScience videos Goals Data

  5. Part1: Popularity Distribution Static popularity characteristics Underlying mechanism

  6. Pareto Principle • 10% popular videos account for 80% total views Other online VoD systems show smaller skew! Fraction of aggregate views Normalized video ranking

  7. Dominant Power-Law Behavior • Richer-get-richer principle If video has K views, then users will watch the video with rate K • word frequency- citations of papers - scale of earthquakes • web hits a y=x Frequency (log) City population (log)

  8. UGC Video Distribution • Straight-line waists and truncated both ends

  9. Focusing on Popular Videos • Why popular videos deviate from power-law? • Fetch-at-most-once[SOSP2003] • Behavior of fetching immutable objects oncecf. visiting popular web sites many times

  10. Simulation on Various Parameters • Number of videos (V), users (U), avg. requests per user (R) Fetch-at-most-once Tail is more truncated forlarger R and smaller V (log) U=1000 R=10 power-lawbehavior Comp. cumulative videos (log) R=50 R=20 R=10 V=100 Views (log)

  11. Why the Unpopular Tail Falls Off • Natural shape is curved • Sampling bias or pre-filters • Publishers tend to upload interesting videos • Information filtering or post-filters • Search results or suggestions favor popular items

  12. Impact of Post-Filters • Videos exposed longer to filtering effect appear more truncated video rank

  13. Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf Zipf + exp cutoff Exponential Log-normal

  14. Is it Naturally Curved? • Matlab curve fitting for Science Science videos Zipf is scale-free, while exponential is scaled : underlying mechanism is Zipf and truncation is due to bottlenecks Zipf Zipf + exp cutoff Exponential Log-normal

  15. Implication of Our Findings “ Latent demand for products that is suppressed by bottlenecks in the system [Chris Anderson, The Long Tail] ” Views Entertainment 40% additional views! How? Personalized recommendation Enriched metadataAbundant videos Rankings

  16. Part2: Popularity Evolution Relationship between popularity and age

  17. Popularity Evolution • So far, we focused on static popularity • Now focus on popularity dynamics • How requests on any given day are distributed across the video age? • 6-day daily trace of Science videos • Step1- Group videos requested at least once by age • Step2- Count request volume per age group

  18. Request Volume Across Age 1. Viewers mildly more interested in new videos

  19. Request Volume Across Age 2. User preference relatively insensitive to age ← 80% requests on old videos

  20. Request Volume Across Age 3. Daily top hits mostly come from new videos

  21. Request Volume Across Age 4. Some old videos get significant requests

  22. Part3: P2P Scalable Distribution Potential savings from P2P (against client-server model) Optimistic upper bound

  23. Peer-assisted VoD • 50-200 Gb/s estimated serving capacity • Bandwidth, hardware, power consumption • Stream from VoD servers or from peers • Varying user lifetime video server movie2 movie1 movie1 user C user A user B P2P when possible

  24. Number of Beneficiary Videos • P2P viable whenat least 2 online usersshare video • Very few videos benefit, but they benefit a lot Estimated number of online users per video at any moment

  25. Server Workload Savings in P2P • Potential for significant savingsDue to skewed and temporal request patterns P2P-assisted

  26. Part4: Content Duplication Level of duplication Birth of duplicates

  27. Content Duplication • Alias-identical or similar copies of the same content • Aliases dilute popularity of a single event • Views distributed across multiple copies • Difficulty in recommendation & ranking systems • Test with 51 volunteers • Find alias using keyword search • Identified 1,224 aliases for 184 original videos

  28. The Level of Popularity Dilution • Popularity diluted up to 2-order magnitude

  29. How Late Aliases Appear? • Significant aliases appear within one week

  30. Contribution • The first detailed study on UGC video popularity • Power-law waist • Truncation at popular/non-popular videos • Analyzed popularity dynamicsusing daily trace • Relationship between popularity and age • Explored potential for P2Pdistribution • Showed difficulty in video ranking due to aliases

  31. Dataset available at http://an.kaist.ac.kr/traces/IMC2007.html Meeyoung Cha meeyoung.cha@gmail.com Questions?

More Related