410 likes | 539 Vues
Using Hierarchical Clustering for Learning the Ontologies used in Recommendation Systems. Vincent Schickel-Zuber, Boi Faltings [SIGKDD’07] Reporter: Che-Wei, Liang Date: 2008/04/10. Outline. Introduction Background Collaborative Filtering Ontology Filtering Learning the Ontologies
E N D
Using Hierarchical Clustering for Learning the Ontologies used in Recommendation Systems Vincent Schickel-Zuber, Boi Faltings[SIGKDD’07] Reporter: Che-Wei, Liang Date: 2008/04/10
Outline • Introduction • Background • Collaborative Filtering • Ontology Filtering • Learning the Ontologies • Clustering Algorithms • Learning Hierarchical Ontologies • Experiments • Conclusion
Introduction • Recommender system • Help people finding the most relevant items based on the preferences of the person and others. • Item-based collaborative filtering (CF) • Recommend items based on the experience of the user as well as other similar users.
Ontology • What is Ontology? • A Multi-inheritance graph structure • Edge represent feature, • Item is an instance of at least one concept
Ontology Filtering • Infer preference ratings of items based on the ratings of known items and the relative position in an ontology.
Outline • Introduction • Background • Collaborative Filtering • Ontology Filtering • Learning the ontologies • Clustering Algorithms • Learning Hierarchical ontologies • Experiments • Conclusion
Background • Users U= {u1,…,um} • Items I= {i1,…,in} • Ru,i=theratingassignedtoitemibyuseru
Collaborative Filtering (1/4) • Collaborative Filtering • Finding similar items • Combine similar items into a recommendation list • Assumption: similar users like similar items
Collaborative Filtering (2/4) • Top-N recommendation strategy 1. Compute pair-wise similarities in matrix R 2. Predict rating of an item i by using the k most similar items to i (i’s neighborhood) 3. Select best N items
Collaborative Filtering (4/4) • Reduce the search space! • But • Search space remain huge an unconstrained • Require user to rate many items to find highly correlated neighbors. • Greatly influenced by the size of the item’s neighborhood.
Ontology Filtering (1/3) • Two input: • Users’ historical data R • An Ontology modeling the domain • Defining the ontology usually not made explicit • wine by color => white and red bytaste?
Ontology Filtering (2/3) 1. Compute a-priori score, APS(c) , nc is number of descendants of concept c 2. Infer ratingby α(y,lca)β(x,lca) • OSS-findtheclosestconceptxtoanygiveny
Outline • Introduction • Background • Collaborative Filtering • Ontology Filtering • Learning the ontologies • Clustering Algorithms • Learning Hierarchical ontologies • Experiments • Conclusion
Clustering algorithm • Clusteringalgorithm • Fuzzyclustering,nearest-neighborclustering,hierarchicalclustering,artificialneuralnetworksforclustering,statisticalclustering. • Hierarchical algorithm • Distance-based clustering • Conceptual-based clustering
Hierarchical algorithm dendrogram
Distance-based Clustering • Distance-basedclustering • Agglomerative clustering • bottom-up • Computeallpair-wisesimilaritiesO(n2) • Partitional clustering • top-down • Lowcomplexity
Concept-Based clustering • Concept-Based clustering • Items need to be represented by a set of attribute-value pairs. • Ex:mammal(body cover,heartchamber,bodytemperature)= (hair, four, regulated) • COBWEB • Classificationtreeisnotheight-balanced • Overallcomplexityisexponentialto#attributes.
Learning Hierarchical Ontologies (1/5) • Userscanbecategorizedindifferentcommunities. • Oneontologyforallusersisnotappropriate • Selectbetterontologytousebasedonuser’spreferences.
Learning Hierarchical Ontologies (2/5) • GenerateawholesetofontologiesΛ
Learning Hierarchical Ontologies (4/5) • Findconceptproblem • Ins(y|x),ifconceptsrepresentstheitemslikedaretoodistantfromdislikedones? • Algorithm2 1.Selectasubsetofontologiesthatperformbest 2.Selectontologyminimizesthedistancebetweenlikedanddislikedconceptsfortheselectedontologies.
LearningMulti-HierarchicalOntologies • Someproblem • Implicitfeature • Limitconceptrepresentation • LimitOF’sinferenceprocess • Ignoreotherpossiblesuboptimalcandidates • Improve:slightlyincreasethesearchspace
Classicalagglomerativeclusteringwithcomplete-linkcriterionfunctionClassicalagglomerativeclusteringwithcomplete-linkcriterionfunction
Experiments • Two data sets: • MovieLens • Rating 943 real users on at least 20 movies. • Total 1682 movies, 19 themes. • Jester • Rating on jokes collected over a period of 4 years. • Contains 24,983 users, 100 jokes.
Evaluating Recommendation Algorithm • RS:recommendationsetRS • Nok:#(Relevantitems) • Nr:#(RelevantitemsinthedatabaseN) • UseF1metric
Hierarchical Clustering Analysis • Execution time in seconds required for the clustering algorithm to generate the ontology.
Multi-Hierarchical Clustering Analysis • Tradeoffbetweenpredictionaccuracyandontologyquality.
Conclusions • Introduce three algorithms • Learns a set of ontologies based on some historical data. • Capable of selecting which one to use based on the user’s perference • Building a multi-hierarchical ontology based on a predefined window size • Experimental results on two famous data sets showed that can produce good ontologies and increase the prediction accuracy. • The learnt ontologies can even outperform traditional item-based collaborative filtering.