Créer une présentation
Télécharger la présentation

Download

Download Presentation

Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida

108 Vues
Download Presentation

Télécharger la présentation
## Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Hao Cheng, Kien A Hua, and Khanh Vu**School of Electrical Engineering and Computer Science University of Central Florida Constrained Locally Weighted Clustering**Contents**• Introduction • Locally Weighted Clustering • Constrained Clustering • Experiments • Conclusions**Clustering**• Clustering is to partition a given dataset into a set of meaningful clusters, so that data objects of a cluster share some similar characteristics. • Data are generally complicated and in high dimensional spaces. The clustering task is non-trivial.**Overview**• Clusters reside in subspaces. • Locally Weighted Clustering: each cluster is associated with an independent weighting vector to capture its local correlation structure. • Pairwise instance-level constraints are usually available for clustering practices. • Constrained Clustering: data points are arranged into small groups based on the given constraints, and then, these groups are assigned to feasible closest clusters.**Conventional Clustering**• Partitional: [K-Means] • Hierarchical: [Single-link, Complete-Link, Ward’s, Bisection K-Means] • Euclidean distance is used to measure the (dis)similarity between two objects. All dimensions are equally important throughout the whole space.**Challenges**• Data reside in high dimensional spaces. • Curse of dimensionality: the space becomes sparse, and objects becomes (equally) far away from each other. • Clusters reside in subspaces. • Different subsets of data may exhibit different correlations; and in each subset, the correlation may vary along different dimensions.**Related Methods**• Global projections: dimension reduction and manifold learning [PCA, LPP] • Adaptive dimension selection: [CLIQUE, ProClus] • Adaptive dimension weighting: [LAC]**Dim 1**Dim 2 Dim 3 Dim 1 & 3 Dim 2 & 3 Different Correlation Structures Dim 1 & 2**Iterate until convergence.**Euclidean distance: K-Means • Iteratively refine the clustering objective. Start with initial centroids. S1: Assign points to closest centroids. NMI: 0.4628 Rand: 0.7373 S2: Update centroids.**PCA and LPP**• Projection directions are defined in order to minimize data distortion. LPP 2-Dim projections PCA 2-Dim projections NMI: 0.5014 Rand: 0.7507 NMI: 0.5294 Rand: 0.7805**Heterogeneous Correlations**• Data in a cluster can be strongly correlated in some dimensions, and in the rest dimensions, the data may vary greatly. The correlation structures differ from cluster to cluster. • A dimension is not equally important for all the clusters. • In a cluster, dimensions are not equally important.**Correlations**Weights Dim 1 & 3 Dim 2 & 3 A weight vector is associated with a cluster. Dim 1 & 2**Local Weights**• A cluster is embedded in the subspace spanned by an adaptive combination of the dimensions. • In the neighborhood of a cluster, weighted Euclidean distance is adopted.**Locally Weighted Clustering**• Minimize the sum of weighted distances • Get rid of zero weights by constraints: • Optimal centroids and weights:**Pairwise distances of points in cluster k.**Smaller weights Smaller pairwise distances, greater correlations Larger weights Iterate until convergence. Locally Weighted Clustering • Weights of a cluster only depend on data points that belong to this cluster. Start with initial centroids, weights. S1: Assign points to closest centroids. S2: Update centroids, weights Greater pairwise distances, smaller correlations**Objective function:**Constraints: LAC • LAC is sensitive to tunable parameter.**Dim 1 & 3**Dim 2 & 3 LWC Dim 1 & 2 NMI: 1 Rand: 1**Constrained Clustering**• A pairwise instance-level constraint tells whether the two points belong to the same cluster • Must link • Cannot link • This form of partial knowledge is usually accessible and valuable to clustering practices. • Constrained clustering: utilizes a given set of constraints to derive better data partitions.**Related Methods**• Learn a suitable distance metric [RCA, DCA] • Guide the clustering process: • Enforce the constraints [Constrained K-Means] • Penalize constraint violations [CVQE] • Unified method: [MPCK-Means]**Chunklet**• Chunklet: ‘a subset of points that are known to belong to the same although unknown class’. • Data objects which are inferred similar, can be placed into the same chunklet. • A set of pairwise constraints can be represented in a Chunklet graph.**Must Link**Cannot Link Chunklet Graph**Must Link**Cannot Link Chunklet Graph**1**4 1 1 1 3 1 Chunklet Graph Must Link Cannot Link the number of points in a chunklet**3**1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 Graph Construction • Initially, each point is a chunklet node. • Merge two nodes if they are inferred similar. • An edge is added between two chunklet nodes if they are inferred dissimilar. Do cluster assignment in chunklet.**Chunklet**Point x belongs to the chunklet Size of the chunklet C1 C2 Gaussian Data • Assume: Data in each cluster follow a Gaussian distribution. • Two clusters:**Chunklet Assignment**• A chunklet can be assigned to a cluster in bulk: • Two neighboring chunklets can be assigned to two different clusters:**Probability of Assignment**• In case of two clusters: • One single chunklet is assigned correctly with the probability where • Two neighboring chunklets**1**1 C1 C2 1 K-Means • K-Means • unaware of constraints, • assigns points independently. • The average number of points (in a chunklet) that are assigned correctly to their true cluster: • Each event (a single point): • events are independent; the occurrences follow a Binomial distribution.**1**1 1 1 C1 C1 C2 C2 1 1 Constrained K-Means • Constrained K-Means enforces the constraints strictly. • The average number of correct assignments: Assume the 3 points belong to cluster 1.**1**1 C1 C2 1 Chunklet Assignment • Chunklet is assigned in bulk. • The average number of correct assignments: Similarly, we can analyze the case of two neighboring chunklets.**Chunklet versus One-by-one**• It is better to assign points in chunklet. • The bigger the chunklet, the more correct assignments. • It is better to assign two neighboring chunklets together.**Build the chunklet graph.**S1: Assign points to closest centroids. Chunklet assignments S2: Update centroids, weights Iterate until convergence. CLWC • Combine local weighting scheme with chunklet assignment. Start with initial centroids, weights.**2**2 3 C2 C1 2 1 2 1 1 C3 Chunklet Assignment • Try to do the most confident assignments first. • If a node has a neighbor, assign they two. • Assign larger chunklets first. • Chunklets are placed in closest feasible clusters.**K-Means**50 links 300 links Better Clustering 4 classes of images (100 each) from COREL DB Ground truth**Techniques:**Datasets: Evaluating metrics: Experimental Setup Pairwise constraints**K-Means**Hierarchical Clustering Dimension Reduction Manifold Learning LAC LWC Performances**Direct enforcement**CLWC Metric learning violation penalty Performances**Conclusions**• An independent weighting vector is used to capture the local correlation structure around a cluster. The weights help define the embedding subspace of a cluster. • Data points are grouped into chunklets based on the input constraints. The points in a chunklet are treated as a whole in the assignment process. Try to do the most confident assignments first (least likely incorrect).**.**• .**Efficiency**• The cost of each iteration is • Local weighting generally lets the algorithm converge fast. • More constraints, the faster the algorithm converges.**3**1 2 3 1 C1 C2 No feasible assignment. Constraint Violations • No guarantee to satisfy all constraints.**Probability Constraints**• Use a real value in the range [-1, 1], to denote the similarity between two points, the confidence that the two points are in the same cluster. • Clique: points are similar (with a high similarity value) to each other. • For each point, search a clique (include this point). • The degree of dissimilar between two cliques can be computed. • Do assignment in clique.**Two Neighboring Chunklets**• Number of correct assignments:**Dim 1**Dim 2 Dim 3