Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida

# Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida

Télécharger la présentation

## Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida Constrained Locally Weighted Clustering

2. Contents • Introduction • Locally Weighted Clustering • Constrained Clustering • Experiments • Conclusions

3. Clustering • Clustering is to partition a given dataset into a set of meaningful clusters, so that data objects of a cluster share some similar characteristics. • Data are generally complicated and in high dimensional spaces. The clustering task is non-trivial.

4. Overview • Clusters reside in subspaces. • Locally Weighted Clustering: each cluster is associated with an independent weighting vector to capture its local correlation structure. • Pairwise instance-level constraints are usually available for clustering practices. • Constrained Clustering: data points are arranged into small groups based on the given constraints, and then, these groups are assigned to feasible closest clusters.

5. Conventional Clustering • Partitional: [K-Means] • Hierarchical: [Single-link, Complete-Link, Ward’s, Bisection K-Means] • Euclidean distance is used to measure the (dis)similarity between two objects. All dimensions are equally important throughout the whole space.

6. Challenges • Data reside in high dimensional spaces. • Curse of dimensionality: the space becomes sparse, and objects becomes (equally) far away from each other. • Clusters reside in subspaces. • Different subsets of data may exhibit different correlations; and in each subset, the correlation may vary along different dimensions.

7. Related Methods • Global projections: dimension reduction and manifold learning [PCA, LPP] • Adaptive dimension selection: [CLIQUE, ProClus] • Adaptive dimension weighting: [LAC]

8. Dim 1 Dim 2 Dim 3 Dim 1 & 3 Dim 2 & 3 Different Correlation Structures Dim 1 & 2

9. Iterate until convergence. Euclidean distance: K-Means • Iteratively refine the clustering objective. Start with initial centroids. S1: Assign points to closest centroids. NMI: 0.4628 Rand: 0.7373 S2: Update centroids.

10. PCA and LPP • Projection directions are defined in order to minimize data distortion. LPP 2-Dim projections PCA 2-Dim projections NMI: 0.5014 Rand: 0.7507 NMI: 0.5294 Rand: 0.7805

11. Heterogeneous Correlations • Data in a cluster can be strongly correlated in some dimensions, and in the rest dimensions, the data may vary greatly. The correlation structures differ from cluster to cluster. • A dimension is not equally important for all the clusters. • In a cluster, dimensions are not equally important.

12. Correlations Weights Dim 1 & 3 Dim 2 & 3 A weight vector is associated with a cluster. Dim 1 & 2

13. Local Weights • A cluster is embedded in the subspace spanned by an adaptive combination of the dimensions. • In the neighborhood of a cluster, weighted Euclidean distance is adopted.

14. Locally Weighted Clustering • Minimize the sum of weighted distances • Get rid of zero weights by constraints: • Optimal centroids and weights:

15. Pairwise distances of points in cluster k. Smaller weights Smaller pairwise distances, greater correlations Larger weights Iterate until convergence. Locally Weighted Clustering • Weights of a cluster only depend on data points that belong to this cluster. Start with initial centroids, weights. S1: Assign points to closest centroids. S2: Update centroids, weights Greater pairwise distances, smaller correlations

16. Objective function: Constraints: LAC • LAC is sensitive to tunable parameter.

17. Dim 1 & 3 Dim 2 & 3 LWC Dim 1 & 2 NMI: 1 Rand: 1

18. Constrained Clustering • A pairwise instance-level constraint tells whether the two points belong to the same cluster • Must link • Cannot link • This form of partial knowledge is usually accessible and valuable to clustering practices. • Constrained clustering: utilizes a given set of constraints to derive better data partitions.

19. Related Methods • Learn a suitable distance metric [RCA, DCA] • Guide the clustering process: • Enforce the constraints [Constrained K-Means] • Penalize constraint violations [CVQE] • Unified method: [MPCK-Means]

20. Chunklet • Chunklet: ‘a subset of points that are known to belong to the same although unknown class’. • Data objects which are inferred similar, can be placed into the same chunklet. • A set of pairwise constraints can be represented in a Chunklet graph.

23. 1 4 1 1 1 3 1 Chunklet Graph Must Link Cannot Link the number of points in a chunklet

24. 3 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 Graph Construction • Initially, each point is a chunklet node. • Merge two nodes if they are inferred similar. • An edge is added between two chunklet nodes if they are inferred dissimilar. Do cluster assignment in chunklet.

25. Chunklet Point x belongs to the chunklet Size of the chunklet C1 C2 Gaussian Data • Assume: Data in each cluster follow a Gaussian distribution. • Two clusters:

26. Chunklet Assignment • A chunklet can be assigned to a cluster in bulk: • Two neighboring chunklets can be assigned to two different clusters:

27. Probability of Assignment • In case of two clusters: • One single chunklet is assigned correctly with the probability where • Two neighboring chunklets

28. 1 1 C1 C2 1 K-Means • K-Means • unaware of constraints, • assigns points independently. • The average number of points (in a chunklet) that are assigned correctly to their true cluster: • Each event (a single point): • events are independent; the occurrences follow a Binomial distribution.

29. 1 1 1 1 C1 C1 C2 C2 1 1 Constrained K-Means • Constrained K-Means enforces the constraints strictly. • The average number of correct assignments: Assume the 3 points belong to cluster 1.

30. 1 1 C1 C2 1 Chunklet Assignment • Chunklet is assigned in bulk. • The average number of correct assignments: Similarly, we can analyze the case of two neighboring chunklets.

31. Chunklet versus One-by-one • It is better to assign points in chunklet. • The bigger the chunklet, the more correct assignments. • It is better to assign two neighboring chunklets together.

32. Build the chunklet graph. S1: Assign points to closest centroids. Chunklet assignments S2: Update centroids, weights Iterate until convergence. CLWC • Combine local weighting scheme with chunklet assignment. Start with initial centroids, weights.

33. 2 2 3 C2 C1 2 1 2 1 1 C3 Chunklet Assignment • Try to do the most confident assignments first. • If a node has a neighbor, assign they two. • Assign larger chunklets first. • Chunklets are placed in closest feasible clusters.

34. K-Means 50 links 300 links Better Clustering 4 classes of images (100 each) from COREL DB Ground truth

35. Techniques: Datasets: Evaluating metrics: Experimental Setup Pairwise constraints

36. K-Means Hierarchical Clustering Dimension Reduction Manifold Learning LAC LWC Performances

37. Direct enforcement CLWC Metric learning violation penalty Performances

38. Performances

39. Conclusions • An independent weighting vector is used to capture the local correlation structure around a cluster. The weights help define the embedding subspace of a cluster. • Data points are grouped into chunklets based on the input constraints. The points in a chunklet are treated as a whole in the assignment process. Try to do the most confident assignments first (least likely incorrect).

40. Thank you!

41. . • .

42. Efficiency • The cost of each iteration is • Local weighting generally lets the algorithm converge fast. • More constraints, the faster the algorithm converges.

43. 3 1 2 3 1 C1 C2 No feasible assignment. Constraint Violations • No guarantee to satisfy all constraints.

44. Constraint Violations

45. Probability Constraints • Use a real value in the range [-1, 1], to denote the similarity between two points, the confidence that the two points are in the same cluster. • Clique: points are similar (with a high similarity value) to each other. • For each point, search a clique (include this point). • The degree of dissimilar between two cliques can be computed. • Do assignment in clique.

46. Two Neighboring Chunklets • Number of correct assignments:

47. Dim 1 Dim 2 Dim 3