1 / 37

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems. Latin American Web Conference IEEE Computer Society, 2008 Presenter: Ying-Ying, Chen. Outline. Introduction Related Works Approach Hybrid User Profiles Content-Based User Profiles

kamuzu
Télécharger la présentation

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter: Ying-Ying, Chen

  2. Outline • Introduction • Related Works • Approach • Hybrid User Profiles • Content-Based User Profiles • Linking Tags to User Interests • Experimental Results • Conclusionsand Comments Speaker: Ying-ying, Chen

  3. Introduction • User profiling is an essential component of personal information agents and recommendation systems in general. • Content-based recommendation approaches rely on profiles were collected from observation of the browsing history or documents read by the user. Speaker: Ying-ying, Chen

  4. Introduction • Recently, collaborative or social tagging sites have achieved widespread success on the Web. In these sites, user annotate resources using a freely chosen set of keywords or tags commonly known as folksonomy. • The activities carried out by users in social tagging systems, including posting resources or assigning tags to resources, have become a novel resource of information about user interests. Speaker: Ying-ying, Chen

  5. Introduction • This paper propose to integrate content-based profiles representing long-term user interests gathered by recommenders through observation of browsing activities with tag-based profiles acquired by capturing the user interaction with one or more collaborative tagging systems. • Hybrid profiles can be exploited to assist users in finding resources, people or tags within social tagging systems. Speaker: Ying-ying, Chen

  6. Related Works • Vector of weighted tag • a vector of weighted tags is obtained using tag frequency of occurrence in there sources a user tagged and it is applied to rank Web search results according to their similarity with this tag vector. • TBProfile • It uses weighted vector of tags to represent userinterests, but tag weights are based on inverse user frequency. Speaker: Ying-ying, Chen

  7. Related Works • Using a single vector of weighted tags has some drawbacks. • More frequent tags lose specificity. • Unique vector or tag cloud can’t embrace diverse interests spanning across different domains. • Graph-based cluster[Au Yeoung at al.] • Multiple tag-clouds Speaker: Ying-ying, Chen

  8. Related Works • A number of problems result from the free-form nature of tagging . • Ambiguity • Synonymy • Solve: contextualizing tags based on the knowledge of user information preferences. Speaker: Ying-ying, Chen

  9. Approach – Hybrid User Profiles • Folksonomies are the primary structure underlying collaborative tagging systems. • Folksonomy can be defined as a tuple F := (U, T, R, Y, ≺) U: users, R: resources, T: tags Y: the user-based assignment of tags to resources by a ternary relation. Y ⊆ U × T × R ≺: a user-specific sub-tag/super-tag-relation ≺⊆ U × T × T Speaker: Ying-ying, Chen

  10. Approach – Hybrid User Profiles • The collection of all tag assignments of a single user constitutes a personomy, Pu. Pu := (Tu, Ru, Iu, ≺u) with Iu :={(t, r) ∈ T × R|(u, t, r) ∈ Y }, Tu := (Iu), Ru := (Iu), and ≺u:= {(t1, t2) ∈ T × T |(u, t1, t2) ∈≺} Speaker: Ying-ying, Chen

  11. Approach Overview Speaker: Ying-ying, Chen

  12. Approach – Content-Based User Profiles • WebDCC (Web Document Conceptual Clustering) • Input : Web Pages • Output : Hierarchy of concepts – User Profile. • Instance are represented using bag-of words approach for document representation. • It builds hierarchy of Concepts. • Each node is Concept and leafs are clusters. • Category is considered to be any set of instances and concept is the internal representation of a category. Speaker: Ying-ying, Chen

  13. Approach – Content-Based User Profiles • User Profile Speaker: Ying-ying, Chen

  14. Approach – Content-Based User Profiles • Agents capture experiences regarding user interests such as Web pages a user read or bookmarked for future reading, read news, etc. • Experiences are vector representations of information items based on the vector space model. Di = {(t1,w1), ..., (tm,wm)} Speaker: Ying-ying, Chen

  15. Approach – Content-Based User Profiles • Hierarchies of concepts produced by this algorithm are classification trees. • Root→most general concept • Terminal concept→cluster • WebDCC integrates classification and learning by sorting each experience through the concept hierarchy and simultaneously updating it. Speaker: Ying-ying, Chen

  16. Approach – Content-Based User Profiles • hierarchy consists of a number of concepts C = {c1, c2, . . . , cn} • In order to automatically assign experiences to concepts with a description given by set of term ci = {(t1,w1), ..., (tm,wm)} • weight associated to the term in the category ci. • This description constitutes a linear classifier for the category. Speaker: Ying-ying, Chen

  17. Approach – Content-Based User Profiles • WebDCC aims at obtaining a hierarchical set of linear classifiers, each of which is based on a set of relevant features. • This goal is achieved by combining • feature selection algorithm to choose the appropriate terms at each node in the tree • supervised learning algorithm to construct a classifier for that node Speaker: Ying-ying, Chen

  18. feature selection algorithm • A feature selection threshold, ; is defined in the [0; 1] range such that the weight required for a feature to be selected needs to be higher than . • A simple and effective approach to weigh terms is the document frequency , denoted by DF(tk); which is the number of instances in which the term tk occurs. Speaker: Ying-ying, Chen

  19. supervised learning algorithm • Each node in the hierarchy acts as a linear classifier which is compared with the resource to be classified • prototype pci • category ci • d are the documents belonging to the category ci • A resource is classified in a certain category if it exceeds a minimum similarity to the category prototype. Speaker: Ying-ying, Chen

  20. Approach – Content-Based User Profiles • Given the cluster sji belonging to the category ci , which is composed of the vector representations corresponding to a set of documents , the centroid vector psji is defined as follows: Speaker: Ying-ying, Chen

  21. Approach–Content-Based User Profiles • As the result of resource comparison with the prototypes , the resource is assigned to the cluster with the closest centroid below the category ci, • C={Csport,Cpolitics}, the Classify function applied to each of them might return the following result: {(Csport,0.97),(Cpolitics,0.14)} Speaker: Ying-ying, Chen

  22. Approach – Content-Based User Profiles • Provided that the similarity is higher than a minimum similarity threshold δ.Experiences no similar enough to any existent centroid according to this threshold cause the creation of new singleton clusters. Speaker: Ying-ying, Chen

  23. Approach – Content-Based User Profiles • Clustercohesiveness • nr:the size of the sr • If the cohesiveness value is higher than a threshold φ; a new concept is created. Otherwise, no updating in the hierarchy takes place. Speaker: Ying-ying, Chen

  24. Approach-Linking Tags to User Interests • In order to build hybrid profiles, categories representing user interests in content-based profiles are populated with the tags users frequently associate to resources in that categories. • Tagged resources have to be first categorized according to the current representation of user interests given by the interest hierarchy. Speaker: Ying-ying, Chen

  25. Approach-Linking Tags to User Interests • For each cluster in the hierarchy, a set of the most frequently used tags is extracted to represent the corresponding tag assignment preferences for the experiences or resources belonging to this cluster. • The set of tags related to a cluster sji within the category ci can be defined within the personomy Pu as follows: Tsji = {t ∈ T |(t,r) ∈ Iu ∧ r ∈ sij } Speaker: Ying-ying, Chen

  26. Approach-Linking Tags to User Interests • Where the tag-frequency for a tag t in Tsji is the number of times the tag was used to tag resources belonging to the cluster as follows: = |{r ∈ R|(t,r) ∈ Iu ∧ r ∈ sij }| Speaker: Ying-ying, Chen

  27. Experimental Results • Experiments were performed using data collected from del.icio.us social bookmarking system. Speaker: Ying-ying, Chen

  28. Speaker: Ying-ying, Chen

  29. Experimental Results • For a given user u ∈ U and a given resource r ∈ R , a tag recommender system tries to find a set of tags ˜ T (u,r) ⊆ T for the user to annotate the resource. • Training set 80% of the total tagged bookmarks • Testing set containing the remaining 20% Speaker: Ying-ying, Chen

  30. Experimental Results • The quality of a given list of top-N recommendations was evaluated considering the number of hits. • Number of hits is the number of tag assignments in the test set that were also present in the top-N recommended tags. • N is the total number of recommendations. Speaker: Ying-ying, Chen

  31. Experimental Results • High values of hit-rate indicate that the algorithm was able to predict the assignments in the test sets of the corresponding users. • ˜ T (u,r)→the set of recommended tags • tags (u,r)→the set of real tags assigned by the user to the resource. Speaker: Ying-ying, Chen

  32. Experimental Results • F-measurewasusedtocombineprecisionandrecallvalues: Speaker: Ying-ying, Chen

  33. Experimental Results • Precision increases as the similarity threshold grows , since clusters are smaller in size and recommendations are based on fewer , but highly similar resources. • Conversely , recall tends to decrease since smaller clusters offer less tag diversity. • The best values of hit-rate can be found in the interval 0.1 ≤ δ ≤ 0.3, within which also the best relation between precision and recall is attained for most users. Speaker: Ying-ying, Chen

  34. Experimental Results • Hybrid profiles were compared with tag recommendation based on two different approaches commonly used in folksonomies: • Most popular tags by user(MPTU) Tagsaresortedaccordingtotheirfrequencyofoccurrenceintheuserresourcesandthetop-Ntagsareinturnappliedtomakerecommendations. Tag-based profiles consisting of a single vector of tags. • Most popular tags by resource(MPTR) It is based on collective knowledge instead of person alone. Speaker: Ying-ying, Chen

  35. Experimental Results • recommendations based on hybrid profiles consistently reached higher hit-rates than the approaches based on tag popularity. Speaker: Ying-ying, Chen

  36. Experimental Results • The differences in the performance of hybrid profiles with respect to MPTU and MPTR tested with a paired two-tailed t-test resulted statistical significant at a level of α =0.05 with p-values 0.0119 and 0.0001 respectively. Speaker: Ying-ying, Chen

  37. Conclusions and Comments • Experimental results showing that hybrid profiles are able to out perform two commonly used recommendation methods based on tag popularity. • Future • Non-obviousness • Discriminating power • Comments • The experimental sample are too small • The possibility of tag-based profile Speaker: Ying-ying, Chen

More Related