310 likes | 550 Vues
Context-Aware Recommendation. Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Dongjoo Lee. Center for E -Business Technology Seoul National University Seoul, Korea. Introduction. Traditional recommendation methods
E N D
Context-Aware Recommendation Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Dongjoo Lee Center for E-Business Technology Seoul National University Seoul, Korea
Introduction • Traditional recommendation methods • Content-based recommendation • What’s the features that can describe the item? • Collaborative filtering • Item based CF • User based CF • Hybrid CF • Issues in using context information in recommendation • What iscontext information? • How to use context information? • Is it really useful to use context information in recommendation? user item
Recommendation Music Recommender Collaborative Filtering Content based Recommendation Rules Input: User1, User DB, Song DB, User Listen Log Input: User1, User DB, Song DB, User Listen Log (User1 hasSimilarTasteWithf User2) (User2 likesf Song2) (User1 notListened Song2) => Recommendf Song2 to User1 (User1 likesf Song1) (Song2 isSimilarWithf Song1) (User1 notListenedSong2) => RecommendfSong2 to User1 Output: Sorted Song List Output: Sorted Song List Vocabularies Recommendf notListened likesf hasSimilarTasteWithf isSimilarWithf Interpreter Assumption. If there is not listen log, user didn’t listen a song before. Interpreter Interpreter Interpreter Assumption. If user listen a song frequently, user likes it. Assumption. If user1 scored similarly with user 2 User1 and user 2 has similar taste. Assumption. If music1 and music2 have similar feature values Music1 and music2 is similar. Interpreter Logs m6 m2 m4 m5 Recommend U2 Listen (when, where,…) U1 HasFeatureValue m3 m7 m1
Recommendation (cont’d) • Recommendation • Context-Aware Recommendation • Context-Aware Collaborative Filtering • Context-Aware Contents-based Recommendation • Context abstraction • Context grouping • Item abstraction • Item grouping • User profiling
1. Context-Aware Collaborative Filtering 각상황 별로 사용자의 성향을 구분하고, 상황 별로 Collaborative Filtering을 수행한다. Active context (추천해야 하는 현재 상황)에 해당하는 상황을찾고 이에 따라서 추천을 수행한다. user item There are too many probable contexts context
2. Context-Aware Content-based Recommendation Last.fm music 0.3 Music Group 1 Context Group 2 Music Group 2 These songs can be recommended. 0.9 Context Group 1 Last.fm user listen logs
2. Context-Aware Content-based Recommendation • Item • Context • Group • Item • Group • Context 2 3 mg1 cg1 1 mg2 cg2 c … … … cgm mg2 • Active context • User Model based recommendation 2 3 1
Abstraction in Music Domain Semantic Annotation = Abstraction with domain concepts Context Abstraction Context can be obtained from users’ listen logs Context Concept Tag belongs tagged time count location occasion country id tagged name User listen Song count gender tagged age Tag_count Users listen logs count Track_count trackOf songBy Learn user’s preference from listen logs Album Artist
1) Context Abstraction Context Concepts context concept fuzzy membership function context context context context Cool filtered data Filter filtered data sensed data Filter Sensor Sensor
1) Context Abstraction – Fuzzy Join Context Data Abstract Context Concepts Fuzzy Join Result Product of two relation Fuzzy Join Functions Fuzziness Cool Hot Cold Temperature α-cut may improve query performance
1) Context Abstraction – Fuzzy Equi-Join • Normal Equi-Join • Fuzzy Equi-Join • the most important thing is fuzzy function (≈) that compares two values • obtain fuzzy membership degree • Performance Improvement • Sort-Merge Join using partial order of fuzzy similarity • SELECT T1.*, T2.*FROM table1 T1 JOIN table2 T2 ON T1.a = T2.b • SELECT T1.*, T2.*, FuzzyValueFROM table t1 JOIN table t2 ON t1.a ≈ t2.bWHERE FuzzyValue > THETA
1) Context Abstraction – Periodic Membership Function Because temporal value is periodic , periodic function is appropriate for calculating membership degree to the temporal concepts. • Dawn, Morning, Noon, Afternoon, Evening, Night, Midnight • Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday • Spring, Summer, Autumn, Winter • New Year’s Day, Valentine’s Day, White Day, Children’s Day, Parents’ Day, Christmas Modified Cosine Function f(x) = max(min(10.0 * cos( 2pi * (x - (150) ) / 1440 - 8.5), 1, 0) dawn time f(x) = max(min(7.0 * cos( 2pi * (x - (60) ) / 1440 - 5.5), 1, 0) midnight f(x) = max(min(4.0 * cos( 2pi * (x - (172800) ) / 525600 - 2.4), 1, 0) Spring
2) Context Grouping • Atomic Context Concept • Assume concepts are independent • Clustering • K-means • Fuzzy C-means • Hierarchical clustering • Mixture of Gaussians • http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.html • Group Meaningful Context
3) Music Abstraction Modeling Use annotations S = {(t, w) | t ∈ T, w ∈ R and 0 ≤ w ≤1} rock Sleeping Beauty {(rock, w1), (90s, w2)} Rose {(rock, w1), (indie, w2)} Sleeping Beauty … alternative Weight calculation My Fist Your Face seen live • Tf-idf • BM25 Rose indie 90s Thinking of You electro I've Got to See You Again romance jazz annotations song
4) Music Grouping • Similar to context grouping
5) User Profiling Item groups context groups User profile mg1 cg1 mg2 cg2 … … cgm mg2 c m … Abstracted context User’s listen log
5) User Profiling – Fuzzy Join and Aggregation Context concepts and fuzzy function User listen logs with context Fuzzy-equivalent Join (Time) Music with annotations Context grouping Item grouping Equivalent Join (Music) Aggregation
Contribution • Model based context aware recommendation • Do not depends on ambiguous relationships among concepts, users, and items • Not from the name or description • But from the semantic annotations, tags • Abstract context concepts by using fuzzy membership functions • Distinguish context concepts from domain concepts • There is no reason to put them together • Even though they have the same name, we have to consider them as different. • Domain concepts are only meaningful when they are used in that domain. They may have different meaning when they are used in different domains.
How to Evaluate? • How to evaluate effect of the context? • Divide logs into training set and test set • Give the same information and see the results of no context using path and context using path • If recommended song list contains the song, it’s ok. • Top k recommendation results.
Experiments • Two domains • Music domain • Last.fm • Movie domain • iMDB • They has different characteristics
Publication Schedule • Target conference • The 2009 IEEE/WIC/ACM International Conference on Web Intelligence (WI ’09) • Info: 15-18 September 2009, Milan Italy • Due date: April 10, 2009 • Notification: June 3, 2009 • Format: IEEE 2 column format, max 8 pages
Deep Research Topic • What is the important features of context and music? • What is the optimal model? music context
Additional Issues • Crawling • Data sampling • Relationship extraction • Approximate string matching
Crawling last.fm • 735,000 users • South Korea, North Korea, Japan, United Kingdom, USA • 5,855,000 tracks • duplicated multiple tracks • 913,720/3,322,000 …… still crawling • 69,725,000 user listen recent tracks • 69,000,000 listen tracks of thousands of users • 6,659,000 user loved tracks • 2,311,000 user tags
Data sampling • 미국 국적에 음악을 많이 들은 상위 100여명 정도에 대해서만 테스트 • select * from lfm_user where country = 'United States' and track_count2 > 1000 order by track_count2 desc • 상위 100여명 정도가 많이 들은 노래 선정 • select * from lfm_rel_user_track_2 where user_id = 'thetasteofink‘ • 상위 100여명 정도가 많이 들은 노래에 있는 tag로 음악 추상화 • Artist, album을 어떻게 활용할지는 일단 보류 • 앨범 이름, 곡명이 일치하지 않는 것 어떻게 처리할지 고려하자. • Approximate string matching을 적용하는 것은 또 다른 문제
Approximate String Matching • Not exact link data – Approximate string matching • Data were represented by user’s song name so that same song have multiple names. • Last.fm does not assume strict foreign key constraints. We have to match by using fuzzy match methods m1 u1 m2 u2 m3 u3 m4 u4
Approximate String Matching • Approximate string search • Levenshtein Distance (Edit Distance) • This calculates the minimum number of insertions, deletions, and substitutions necessary to convert one string into another. • http://www.merriampark.com/ld.htm • Gestalt • SoundEx • Its goal is to group letters that sound alike, then convert the name into a series of numbers that can represent the name • Jaccard Similarity • http://en.wikipedia.org/wiki/Jaccard_Similarity_Coefficient • Cosine Similarity • http://en.wikipedia.org/wiki/Cosine_similarity • Dice Similarity • http://en.wikipedia.org/wiki/Dice%27s_coefficient
Packages • FLAMINGO Package • http://flamingo.ics.uci.edu/releases/2.0/
Additional Research Topics • Approximate string matching for Korean • 한글은 초성, 중성, 종성으로 나누어지기 때문에 글자 단위가 아니라, 이 같은 음소 단위로 처리 해야할 필요가 있음. • 동일한 발음을 가지는 ‘ㅔ’,’ㅐ’ 등의 모음 처리 및 ‘ㄷ’,’ㅌ’ 등의 자음 받침에 대한 고려가 필요함. • What can it be used for? • 검색어 추천 및 맞춤법 교정 • 한글 데이터가 포함된 Data mining • 웹 상의 자료는 오타, 맞춤법 오류 등이 많으므로 이를 고려해야 함