1 / 39

Multiple Domain User Personalization

Multiple Domain User Personalization. Deepak Agarwal Yahoo! Research. Yucheng Low Carnegie Mellon University. Alexander J. Smola Yahoo! Research. Information Flood. Personalization. Golf Reader. Tech. Reader. Can we provide personalization to new users?. One Domain Cold-Start.

enye
Télécharger la présentation

Multiple Domain User Personalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research

  2. Information Flood

  3. Personalization Golf Reader Tech. Reader Can we provide personalization to new users?

  4. One Domain Cold-Start Movies User 1 User 2 Impossible when you have only one domain. Best you can do is to have a good baseline.

  5. Multiple Domains Cold Start Music Movies News Possible when you have many domains.

  6. Personalization across all domain Combine tokens from all spaces ignoring the source domain Expand token space to include source domain User Reads Golf News Watches MTV Golf, Tiger, Music, Song Golf:1, Tiger:1, Music:2, Song:2 Your Favorite Personalization Algorithm

  7. Personalization across all domain Combine tokens from all spaces ignoring the source domain Expand token space to include source domain User • Domains with more observations willswamp out all other domains Reads Golf News Watches MTV Golf, Tiger, Music, Song Golf:1, Tiger:1, Music:2, Song:2 • What is a good personalization algorithm that will work for all domains? Your Favorite Personalization Algorithm

  8. Solution Meta-Profile • Isolates each domain: Prevents larger domains from swamping out smaller domains. User Meta Profile User Music Profile User News Profile Personalized Music Personalized News

  9. Solution Meta-Profile • Extensible: domains can be added/removed easily User Meta Profile User Music Profile User News Profile User Movie Profile

  10. Latent Dirichlet Allocation Topic 2 Topic 1 Topic 3 Basketball NBA, hoop Train 3-point Machine, Learning, Neural, Network, Train Golf, Tiger, Woods, Club, Green, Hole-in-one Document Michael I. Jordan trains a Neural Network to play golf Topic 1 Topic 2 3 2 Topic 3 Network Golf

  11. Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each • document • Draw a topic • Draw a word • from the topic N Document

  12. Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each • document • Draw a topic • Draw a word • from the topic N Document Document

  13. Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each document • Draw a topic • Draw a word • from the topic Sample From: N Document Document

  14. Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each document • Draw a topic • Draw a word • from the topic N Document Topic 1:Basketball, Michael, Jordan Topic 2:Golf, Tiger, Woods, Club, Green Topic 3: Machine, Learning, Neural

  15. Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. Words which make up each topic • Each document has a mixture over topics • For each word in each document • Draw a topic • Draw a word • from the topic N Document Topics which make up each document

  16. Single Domain Personalization A user’s interaction with a domain is a bag of words. A topic is a mixture of words. Words which make up each topic • Each user has a mixture • over topics • For each word in each document • Draw a topic • Draw a word • from the topic N User Topics each user is interested in

  17. Multiple Domain Personalization A user’s interaction with a domain is a bag of words. A topic is a mixture of words. User’s prior interest in a domain is N User u’s interaction with domain d User Each user has a meta-profile: Each domain has a latent matrix:

  18. Solution Meta-Profile User Meta Profile User Music Profile User Movie Profile User News Profile

  19. Users Music Topic->word table News Topic->word table Movies Topic->word table

  20. Gibbs Sampling N User u’s interaction with domain p LDA

  21. Gibbs Sampling 1: Sample N User u’s interaction with domain p Sample using LDA Sampler Hold Constant Hold Constant

  22. Gibbs Sampling 1: Sample 2: Sample N User u’s interaction with domain p Hold Constant Langevin Diffusion Hold Constant Sample

  23. Gibbs Sampling 1: Sample 2: Sample 3: Optimize N User u’s interaction with domain p LBFGS Hold Constant Optimize Hold Constant

  24. Experiments

  25. Experiments @ Yahoo! • 2 domain dataset. Frontpage and News clicks of 5.6 million users. Frontpage/News: Article text for each click. • 3 domain dataset: Frontpage, News and MyYahooclicks of 5.6 million users. MyYahoo: Only has article IDs for each click with no text. Not semantically meaningful. All user information was anonymized.

  26. Test Protocol Holdout proportionof users who see more than one domain. Hide one of those domain and try to predict the words. Prediction metric is cosine similarity Baseline is “mean prediction”.

  27. Implementation • Distributed implementation in C++ using Memcached for communication. • Alex Smola, ShravanNarayanamurthy “An Architecture for Parallel Topic Models” VLDB 2010 • Distributed LBFGS line search: • Implement standard MPI-like in Memcached. • Broadcast • Reduce • Barrier • Takes 2-3 days for 500 iterations on 30 machines

  28. 2 Property Sanity Check

  29. 2 Property

  30. 3 Property

  31. 3 Property

  32. Frontpage -> News Science Celebrity bacteria, fight, super, struggling, developed, doctors, resistant, lethal, virtually, drugs, antibiotic, competitors, chad, film, movie, movies, films, director, story, avatar, james, time, hollywood, big, make, hes, star, sandra, oscar, oscars, red, carpet, bullock, golden, gown, bullocks, nominee, bestactress, sparkles, stunning, vienna, bachelor, jake, pavelka, giraldi, finale, show, stars, dancing, love, season, time, abc, Entertainment Science Fiction

  33. News -> Frontpage Politics Devices health, care, bill, obama, president, rep, house, republican, senate, news, sen, democrats, fox, congress, reform drafts, player, nfl, scouts, team, riskiest, peril, bryant, dez, pick, talented, nba, james, news, iphone, apple, app, apps, ipod, google, store, apples, android, mac, mobile, touch, ipad, device, phone, college, year, earn, years, 000, bestpaid, average, 129, colleges, graduates, ten, alums, schools, actor, likes, home, bank, facing, ceo, gomez, eviction, rosalina, bought, cleaning, foreclosed, client, janitor, offices, surprising, video,, captured, inside, mountain, terrorist, observers, impresses, alqaidas, complexity, base, features, hideout, size, special, secret, struck,, College

  34. Extension User Meta Profile User Music Profile User News Profile User Movie Profile Latent Dirichlet Allocation Latent Dirichlet Allocation Latent Dirichlet Allocation

  35. Extension • Flexible: Allows different algorithm for each domain User Meta Profile User Music Profile User News Profile User Movie Profile fLDA Matrix Factorization Linear Model

  36. It Is How You Use It Use the Meta Profile for Initialization. User Meta Profile User Music Profile Personalized with Algorithm X

  37. It Is How You Use It Periodically Update the Meta Profile and Domain Latent Matrix User Meta Profile User Music Profile Personalized with Algorithm X

  38. Conclusion • An generic, extensible model for combining domain personalization schemes. • Scalable inference procedure that extends to millions of users. • Demonstrate strong predictive performance on a large real world data

  39. Questions?

More Related