1 / 33

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation. Meng -Fen Chiang, Tsung -Wei Wang and Wen-Chih Peng Department of Computer Science National Chiao Tung University (R.O.C.). Outline. Introduction Related Work problem Definition Parallel RWR

ajaxe
Télécharger la présentation

Parallelizing Random Walk with Restart for Large-Scale Query Recommendation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang and Wen-ChihPeng Department of Computer Science National Chiao Tung University (R.O.C.)

  2. Outline • Introduction • Related Work • problem Definition • Parallel RWR • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

  3. Introduction • Yahoo! Asia Knowledge Plus (AKP) Question Answer

  4. Introduction (contd.) • User access log • Consider a QA pair as an Item • A sequence of items clicked by a user • Typically, what a user looks for during a short period shares certain topics • Within 4 min, 18 sec. “Upload photos to Facebook “

  5. 0.03 0.04 10 9 0.10 12 2 0.08 0.02 0.13 8 1 0.13 11 3 0.04 4 0.05 6 5 0.13 7 0.05 Introduction (contd.) • Random Walk with Restart (RWR) • Compute relevance scores of a set of node for a query node

  6. Outline • Introduction • Related Work • problem Definition • Parallel RWR • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

  7. Related Work • Random Walk with Restart (RWR) • Off-line mode • Pre-compute required information off-line • Pros : fast on-line recommendation for a query • Cons : prohibitive storage consumption • On-line mode • Compute recommendation for a query on-line • Pros : less storage consumption • Cons : longer response time • Fast RWR • Less storage consumption • Fast on-line response time for a query

  8. Related Work (contd.) • Scalable recommendation • SmartMiner • Identify user sessions • Mine frequent navigation patterns • Personalized community recommendation • 312 K active users, 109 K popular communities • Training time ~ 14 mins (200 nodes) • Personalized news recommendation • Handel streaming content • No explicit runtime analysis of off-line training and on-line recommendation

  9. Outline • Introduction • Related Work • problem Definition • Parallel RWR • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

  10. Problem Definition • Goal • Given user click logs, a query item I • Recommend relevant items w.r.t. I • Requirements • Effectiveness • Mine frequent navigation patterns from click logs • Scalability • Efficiently manage large-scale click logs within few hours • Parallelization of RWR • Parallelization of RWR for multiple query nodes

  11. Outline • Introduction • Related Work • problem Definition • A framework for scalable recommendation • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

  12. System Architecture User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . . Input Off-Line Computation Storage

  13. Mining Temporal Following Patterns in Parallel User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . .

  14. Temporal Following Relation • Frequent QA browsing behaviors of users within a pre-defined time window • E.g., window size = 150 sec. User Click Stream : Item 1 Item 2 Item 3 Item 4 70 0 30 160 Temporal Following relation : <Item 1, Item 2> : dt = 30 <Item 1, Item 3> : dt = 70 <Item 1, Item 4> : dt = 160 . . .

  15. Temporal Following Pattern Mining User click logs Parameters . . . Emit temporal following pairs for each item Mapper N Mapper 1 Temporal Following Relations <Itemi, Itemj:cntij> Aggregate temporal following relation for each item . . . Reducer 1 Reducer N Temporal Following Patterns <Itemi, <Itemj:cntij, …, Itemz:cntiz>>

  16. Recommendation Graph Construction User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . .

  17. Recommendation Graph Construction • Goal • Transform discovered temporal following patterns to a recommendation graph • E.g., n2 Recommendation Graph cnt12 Temporal Following Pattern n4 n1 <Item 1, <Item2:cnt12, item3:cnt13>> <Item 4, <Item3:cntt13>> cnt43 cnt13 n3

  18. Paralleling Random Walk with Restart User Access Logs Temporal Following Pattern Mining Item ID : <Item List> . . . Parameters: window size bin size Recommendation Graph Construction Query Items : Item 1 Item 2 . . . Random Walk with Restart Item ID : <Item List> . . .

  19. 10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 Paralleling Random Walk with Restart • With single query

  20. Paralleling RWR With Single Query User click logs q : an item Parameters Machine 1 : Set initial score for q Machine N : Set initial score for q . . . Initialization Machine 1 : Calculate relevance score for each item Machine N : Calculate relevance score for each item . . . RWR Machine 1 : Calculate difference of relevance score vectors Machine N : Calculate difference of relevance score vectors . . . Convergence Yes No Converged

  21. 10 10 9 9 12 12 2 2 8 8 1 1 11 11 3 3 0.04 0.04 0.03 0.03 10 10 9 9 0.10 0.10 12 12 4 4 0.10 0.13 0.08 0.13 2 2 0.02 0.02 8 8 1 1 11 11 0.10 0.13 3 3 6 6 0.04 0.04 5 5 4 4 0.05 0.13 6 6 5 5 0.13 0.13 7 7 7 7 0.05 0.13 Paralleling Random Walk with Restart • With multiple query 0.13

  22. Paralleling RWR With Multiple Queries User click logs Q : items Parameters Machine 1 : Set initial score for Q Machine N : Set initial score for Q . . . Initialization Mapper 1 : Calculate diffusion score for each item w.r.t. each q Mapper N : Calculate relevance score for each item w.r.t. each q . . . RWR Reducer 1 : Sum up diffusion score for each item w.r.t. q Reducer N : Sum up diffusion score for each w.r.t. q . . . Until Maximum iteration <Itemi, <q1:rs1i, …, qz:rs1z> <adjacent list>>

  23. Paralleling RWR With Multiple Queries • Diffusion score for each item w.r.t. q • Sum up diffusion scores for each item w.r.t. q

  24. Outline • Introduction • Related Work • problem Definition • Parallel RWR • Temporal following pattern mining • Recommendation graph construction • Random walk with restart for multiple queries • Experimental Results • Conclusion

  25. Experimental Setup • Yahoo! Asia Knowledge Plus (AKP) • Duration : 1-week in July, 2009 • #clicks : 90 M • #items : 4 M • #users : 2 M • Performance evaluation • Quality study • Scalability study • Case study

  26. Quality Study • User access logs • Train 80% • Test 20% • Groundtruth • For each item I clicked by user U • The set of items clicked by U after I within T sec. • Measure the similarity with historical user click logs • Item-precision • Item-recall

  27. Quality Study (contd.) • Top-k hot items in the category of test item (HC) • Temporal following pattern (TFP) • RWR based on temporal following pattern (RWRTFP) • Higher precision & recall

  28. Scalability Study • Temporal following pattern (TFP) • 4.1M items • 40 sec. • RWR based on temporal following pattern (RWRTFP) • #sizes of input data • #computing nodes

  29. Scalability Study (contd.) • Computational cost is significantly reduced as number of machines increases • More queries, more computation effective • 0.74 sec. (2K queries)  0.49 sec. (10K queries)

  30. Case Study • Query Item • “What can I do if I do not have Word?”

  31. Conclusion • Proposes a parallel RWR for multiple query recommendation • Parallelize mining frequent navigation behavior • Parallelize RWR • Compute RWR for multiple queries in parallel • The recommender system • General • Content- agnostic

  32. Q & A

  33. Temporal Following Pattern Mining User click logs Parameters Mapper 1 : Emit temporal following pairs for each item Mapper N : Emit temporal following pairs for each item . . . Temporal Following Relations <Itemi, Itemj:dtij> Reducer 1 : Aggregate temporal following relation for each item Reducer N : Aggregate temporal following relation for each item . . . Temporal Following Patterns <Itemi, <Itemj:dtij, …, Itemz:dtiz>>

More Related