1 / 29

Research Trends in Multimedia Content Services

Research Trends in Multimedia Content Services. Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences. András A. Benczúr. Web 2.0, 3.0 …?. Platform convergence (Web, PC, mobile, television) – information vs. recreation

komala
Télécharger la présentation

Research Trends in Multimedia Content Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences András A. Benczúr

  2. Web 2.0, 3.0 …? • Platform convergence (Web, PC, mobile, television) – information vs. recreation • Emphasis on social content (blogs, Wikipedia, photo and video sharing) • From search towards recommendation (query free, profile based, personalized) • From text towards multimedia • Glocalization (language, geography) • Spam

  3. A sample service RSS Web 2.0 Recommender engine client software • Small screen browsing • Recommendation based on user profile (avoid query typing) • Read blogs, view media, …

  4. The user profile • History stored for each user: • Known ratings, preferences, opinion – scarce! • Items read, weighted by time spent • details seen, scrolling, back button • Terms in documents read, tf.idf weighted top list • User language, region, current location and known sociodemographic data • Multimedia!

  5. Same item—multiple source

  6. Information vs recreation: Do not mix the two?

  7. Spam is increasingly annoying

  8. Distribution of categories Unknown 0.4% Alias 0.3% Empty 0.4% Non-existent 7.9% Ad 3.7% Weborg 0.8% Spam 16.5% Reputable 70.0%

  9. Keresési találati pozíció hatása Találati pozíció nézésével töltött idő Találathoz érkezés ideje

  10. Multimedia Information Retrieval

  11. Segmentation Similar objects

  12. ImageCLEFObject Retrieval Task Class of Query Image Pre-classified Images VOC2007 Query Images Original Training Set

  13. Networked relation • spam • social network analysis • churn

  14. Szociális hálózatok ADSL --- ADSL --- home business

  15. Biztosítási csalások – hálózatban

  16. Stacked Graphical Learning ? • Predict churn p(v) of node v • For target node u, aggregate p(v) for neighbors to form new feature f(u) • Rerun classification by adding feature f(.) • Iterate v7 v1 v2 u

  17. Why social networks are hard to analyze Subgraphs of social networks Tentacles induce noise Medium size dense communities attract much algorithmic work

  18. Mapping into 2D plain spectral semidefinite

  19. Research Highlights Recommenders: KDD Cup 2007 Task 1 First Prize Predict the probability that a user rated a movie in 2006, based on year –2005 training data Spam filtering: Web Spam Challenge 1 first place Churn prediction: method presented at KDD Cup 2009 Workshop Task XXXX

  20. Netflix: lessons and differences learned • Ratings 1– 5 stars • Predict an unseen rating • Evaluation: RMSE • 0.8572: $1,000,000 • Current leader: 0.8650 • Oct/07: 0.8712 KDD Cup 2007 • same data set • predict existence of a rating

  21. Results of two separate tasks BellKor team report [Bell, Koren 2007]: • Low rank approximation • Restricted Boltzmann Machine • Nearest neighbor KDD Cup 2007: Predict probability that a user rated a movie in 2006: • Given list of 100,000 user–movie pairs • Users and movies drawn from Netflix Prize data set Winner report [K, B, and our colleauges 2007]

  22. Evaluation and Issue 1 • For a given user i and movie j • where is the predicted value • KDD Cup example: • Our RMSE: 0.256 • First runner up: 0.263 • All zeroes prediction: 0.279 (Place 10-13) • But why do we use RMSE and not precision/recall? • RMSE preferes correct probability guesses for the majority unfrequently visited items • The presence of the recommender changes usage

  23. Method Overview • Probability by naive user-movie independence • Item frequency estimation (Time Series) • User frequency estimation • Reaches RMSE 0.260 in itself (still first place) • Data Mining • SVD • Item-item similarities • Association Rules • Combination (we used linear regression)

  24. Time series prediction Interest remains for long time range (several years)

  25. Short lifetime of online items Publication day Next day usage peak Origo Very different behavior in time: news articles Third day and gone … http://www.origo.hu/filmklub/20060124kiolte.html

  26. SVD user movie news item • K-dim SVD: Noise filtering – the essence of the matrix – optimizes • SVD explains ratings as effect of few linear factors • RMSE (ℓ2 error) 10-30 dim: 0.93 • Issue: too many news items • 18K Netflix movies vs. • potentially infinite set of items • -> may recommend data source but not the item

  27. Lessons learned • Content similarity might be the key feature • Relative success of trivial estimates on KDD Cup! • Data mining techniques overlap, apparently catch similar patterns • Precision/recall is more important than RMSE • Solution must make heavy use of time

  28. Future plans and ideas • New partners and application fields: network infrastructure, new generation services, bioinformatics, …? • Scaling our solutions to multi-core architectures • Use our search (cross-lingual, multimedia etc) and recommender system capabilities in major solutions; mobile, new generation platforms etc. • Expand means of our European level collaboration, e.g. KIC participation

  29. Questions ? benczur@sztaki.hu http://datamining.sztaki.hu Andras A. Benczur

More Related