A clustering algorithm to find groups with homogeneous preferences

A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad de Oviedo at Gijónwww.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences

The framework to learn preferences People tend to rate their preferences in a relative way Which middle circle do you think is larger?

Me The framework to learn people’s preferences • Regression is not a good idea • We will use training sets of preference judgments • pairs of vectors (v, u) where someone expresses that he or she prefers v to u SVMlinear {vi > ui: i  IMe } fMe fMe is a linear ranking function: f(vi) > f(ui) whenever vi is preferable to ui

Me The framework to learn people’s preferences SVMlinear {vi > ui: i  IMe } fMe • How useful is this ranking functionfMe? • Accuracy, generalization error • # Training examples • # Attributes • reliable, general, …

{vi1 > ui1} {vi4 > ui4} f1 f4 f2 f2U3 P4 P3 P1 P2 f3 The problem addressed To improve ranking functions, we present a new algorithm for clustering preference criteria if f2U3 is better than f2 and f3 {vi2 > ui2} f2U3 {vi3 > ui3}

Applications • Information retrieval Optimizing Search Engines Using Clickthrough Data[Joachims, 2002] • Personalized recommenders Adaptive Route Advisor [Fiechter, Rogers, 2000] • Analysis of sensory data Used to test the quality (or the acceptability) of market products Panels of experts and consumers

{object1 rating1} {object2 rating2} {object3 rating3} {object4 rating4} P1 P2 P3 P4 Baseline approaches If ratingi ratingjthenmerge Pi with Pj Where  uses correlation or cosine

Weaknesses of baseline approaches Correlation or cosine were devised for prediction purposes in collaborative filtering, and they are not easily extendable to clustering: • Not all people have seen the same objects • Two samples of preferences of the same person would not be considered homogeneous • Rating is not a good idea

Our approach: a clustering algorithm Ranking functions are linear maps: f(x) =w·x Then weight vectors w codify the rationale for these preferences Therefore, we will try to merge data sets with similar (cosine) ranking functions (= weight vectors) The merge will be accepted if the join ranking function improves the quality of individual functions

Our approach: a clustering algorithm • A set of clusters ClusterPreferencesCriteria (a list of preference judgments (PJi: i = 1,…, N)) { • Clusters = ; for each i = 1 to N {wi = Learn a ranking hyperplane from (PJi); Clusters = Clusters U {(PJi, wi)}; } repeat { let (PJ1, w1) and (PJ2, w2) be the clusters with most similar w1 and w2;w = Learn a ranking hyperplane from (PJ1 U PJ2); if (quality of w >= (quality of w1 + quality of w2)) then replace the clusters (PJ1, w1) and (PJ2, w2) by (PJ1 U PJ2, w) in Clusters; } until (no new merges can be tested); return Clusters; • }

To estimate quality of ranking functions The quality of the ranking functions depends on: • Accuracy, generalization errors • Number of Training examples • Number of Attributes

To estimate quality of ranking functions If we have enough training data: • divide them in train (itself) and verification sets • compute the confidence interval of the probability of error when we apply each ranking function to the corresponding verification set: [L, R] • quality is 1-Rthe estimated proportion of successful generalization errorsin the pessimistic case

To estimate quality of ranking functions If we don’t have too many training data: • Xi-alpha estimator [Joachims, 2000] (texts) • Cross-validation • Other

Experimental results We used a collection of preference judgments taken from EachMovie to simulate reasonable situations in the study of preferences of groups of people • People: the 100 spectators with more ratings • Objects: the ratings of 504 movies (60% train, 20% verification, 20% test) given by other 89 spectators 808 spectators Training sets: preference judgments

Experimental results 808 89

A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad de Oviedo at Gijónwww.aic.uniovi.es Workshop on Implicit Measures of User Interests and Preferences

A clustering algorithm to find groups with homogeneous preferences

A clustering algorithm to find groups with homogeneous preferences

Presentation Transcript

Local Clustering Algorithm

Linear Clustering Algorithm

SCAN : A Structural Clustering Algorithm for Networks

Clustering Social Networks (with groups!)

HCS Clustering Algorithm

local-density based spatial clustering algorithm with noise

Event Detection using a Clustering Algorithm

k - medoid clustering with genetic algorithm

Forming Groups Homogeneous vs. Heterogeneous

A novel genetic algorithm for automatic clustering

Support Vector Clustering Algorithm

Clustering Algorithm

APSCAN: A parameter free algorithm for clustering

SCAN: A Structural Clustering Algorithm for Networks

A Genetic Algorithm Approach to K -Means Clustering

A New Gravitational Clustering Algorithm

K-means Clustering Algorithm with Matlab Source code

Boosting Algorithm for Clustering

Local Clustering Algorithm

A novel genetic algorithm for automatic clustering

SCAN: A Structural Clustering Algorithm for Networks

Towards a clustering algorithm for CALICE