230 likes | 312 Vues
The Stability of a Good Clustering. Marina Meila University of Washington mmp@stat.washington.edu. similarities. Optimizing these criteria is NP-hard’. worst case. Data Objective Algorithm. K-means. Spectral clustering.
E N D
The Stability of a Good Clustering Marina Meila University of Washington mmp@stat.washington.edu
similarities Optimizing these criteria is NP-hard’ worst case • Data • Objective • Algorithm K-means Spectral clustering ...but “spectral clustering, K-means work well when good clustering exists” interesting case This talk: If a “good” clustering exists, it is “unique” If “good” clustering found, it is provably good
Results summary • Given • objective = NCut, K-means distortion • data • clustering Y with K clusters • Spectral lower bound on distortion • If small • Then small where = best clustering with K clusters
distortion lower bound A graphical view clusterings
Overview • Introduction • Matrix representations for clusterings • Quadratic representation for clustering cost • The misclassification error distance • Results for NCut (easier) • Results for K-means distortion (harder) • Discussion
Clusterings as matrices • Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK) • Represented by n x K matrix • unnormalized • normalized • All matrices have orthogonal columns
similarities Distortion is quadratic in X NCut K-means
mkk’ k k’ = The Confusion Matrix Two clusterings • (C1, C2, ... CK) with • (C’1, C’2, ... C’K’) with • Confusion matrix (K x K’)
The Misclassification Error distance • computed by the maximal bipartite matching algorithm between clusters k confusion matrix classification error k’
Results for NCut • given • data A (n x n) • clustering X (n x K) • Lower bound for NCut (M02, YS03, BJ03) • Upper bound for (MSX’05) whenever largest e-values of A
Relaxed minimization for s.t. X = n x K orthogonal matrix Solution: X* = K principal e-vectors of A small w.r.t eigengap K+1-K X close to X* convexity proof Two clusterings X,X’ close to X* trace XTX’ large trace XTX’ large small
Why the eigengap matters • Example • A has 3 diagonal blocks • K = 2 • gap( C ) = gap( C’ ) = 0 but C, C’ not close C C’
Remarks on stability results • No explicit conditions on S • Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal • But…results apply only if a good clustering is found • There are S matrices for which no clustering satisfies theorem • Bound depends on aggregate quantities like • K • cluster sizes (=probabilities) • Points are weighted by their volumes (degrees) • good in some applications • bounds for unweighted distances can be obtained
Is the bound ever informative? • An experiment: S perfect + additive noise
K = 4 dim = 30 4 K-means distortion • We can do the same ... • ...but, K-th principal subspace typically not stable
New approach: Use K-1 vectors • Non-redundant representation Y • Distortion – new expression • ...and new (relaxed) optimization problem
Solution of the new problem • Relaxed optimization problem given • Solution • U = K-1 principal e-vectors of A • W = KxK orthogonal matrix • with on first row
small Y close to Y* Clusterings Y,Y’ close to Y* ||YTY’||F large ||YTY’||F large small Solve relaxed minimization
Theorem For any two clusterings Y,Y’ with Y, Y’ > 0 whenever Corollary: Bound for d(Y,Yopt)
K = 4 dim = 30 Experiments 20 replicates pmin bound true error
Conclusions • First (?) distribution independent bounds on the clustering error • data dependent • hold when data well clustered (this is the case of interest) • Tight? – not yet... • In addition • Improved variational bound for the K-means cost • Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2distance) • Related work • Bounds for mixtures of Gaussians (Dasgupta, Vempala) • Nearest K-flat to n points (Tseng) • Variational bounds for sparse PCA (Mogghadan)