1 / 26

Clustering-A neural network approach

Clustering-A neural network approach. K.-L. Du NN, Vol.23, No. 1, 2009, pp. 89-107. Presenter : Wei- Shen Tai 2010 / 1/26. Outline. Introduction Competitive learning - SOM Problems and solution Under-utilization problem Shared clusters - Rival-penalized competitive learning

shayla
Télécharger la présentation

Clustering-A neural network approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering-A neural network approach K.-L. Du NN, Vol.23, No. 1, 2009, pp. 89-107. Presenter : Wei-Shen Tai 2010/1/26

  2. Outline • Introduction • Competitive learning - SOM • Problems and solution • Under-utilization problem • Shared clusters - Rival-penalized competitive learning • Winner-take-most rule relaxes the WTA - Soft competitive learning • Outliers - Robust clustering • Cluster validity • Computer simulations • Summary • Comments

  3. Motivation • Competitive learning based clustering method • Those differences of training process between methods determine the final results of each one. • Discuss several effective solutions for solving problems occurring in competitive learning.

  4. Objective • A comprehensive overview of competitive learning based clustering methods. • Associated topics such as the under-utilization problem, fuzzy clustering, robust clustering.

  5. Vector quantization (VQ) • It is a classical method for approximating a continuous probability density function (PDF) • Using a finite number of prototypes. A set of feature vectors x is represented by a finite set of prototypes. • A simple training algorithm for vector quantization is: • Pick a sample point at random • Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance • Repeat

  6. Competitive learning • Competitive learning • It can be implemented using a two-layer (JK) neural network. The output layer is called the competition layer. • Objective function • It is usually derived by minimizing the mean squared error (MSE) functional.

  7. Kohonen network • SOM • It is not based on the minimization of any known objective function. • Major problems • such as forced termination, unguaranteed convergence, non-optimized procedure, and the output being often dependent on the sequence of data. • There are some proofs for the convergence of the one-dimensional SOM based on the Markov chain analysis ,but no general proof of convergence for multi-dimensional SOM is available.

  8. Input Competitive Linear (target) Learning vector quantization • LVQ algorithms define near-optimal decision borders between classes. • Unsupervised LVQ • It is essentially the Simple Competitive Learning (SCL) based VQ. • Supervised LVQ • It is based on the known classification of feature vectors, and can be treated as a supervised version of the SOM. • This algorithm tends to reduce the point density of ci around the Bayesian decision surfaces.

  9. Mountain clustering • A simple and effective method for estimating the number of clusters and the initial locations of the cluster centers. • The potential for each grid is calculated based on the density of the surrounding data points. • The grid with the highest potential is selected as the first cluster center and then the potential values of all the other grids are reduced according to their distances to the first cluster center. • The next cluster center is located at the grid point with the highest remaining potential. • This process is repeated until the remaining potential values of all the grids fall below a threshold.

  10. Subtractive clustering • As a modified mountain clustering • Uses all the data points to replace all the grid points as potential cluster centers. This effectively reduces the number of grid points to N. • The potential measure for each data point xi is defined as a function of the Euclidean distances to all the other input data points. • It requires only one pass of the training data. Besides, the number of clusters does not need to be pre-specified.

  11. Neural gas • It is a topology-preserving network, and can be treated as an extension to the C-means. It has a fixed number of processing units, K, with no lateral connection. • In step t, find the ranking of prototype vector according to the distance between input xtand individual prototype. • The prototypes are updated by • A data optimal topological ordering is achieved by using neighborhood ranking within the input space at each training step. • Unlike the SOM, which uses predefined static neighborhood relations, the NG determines a dynamical neighborhood relation as learning proceeds. • The NG is an efficient and reliable clustering algorithm, which is not sensitive to the neuron initialization.

  12. Adaptive resonance theory (ART) • The theory leads to a series of real-time unsupervised network models for clustering, pattern recognition, and associative memory. • At the training stage, the stored prototype of a category is adapted when an input pattern is sufficiently similar to the prototype. • When novelty is detected(input is dissimilar to existing prototypes), the ART adaptively and autonomously creates a new category with the input pattern as the prototype. • The ART has the ability to adapt, yet not forget the past training, and it overcomes the so-called stability-plasticity dilemma. • Like the incremental C-means, the ART model family is sensitive to the order of presentation of the input patterns.

  13. Lateral inhibition of ART • It typically consists of a comparison field (F1) and a recognition field (F2) composed of neurons. • Each recognition field neuron outputs a negative signal to each of the other recognition field neurons and inhibits their output accordingly.

  14. Supervised clustering • When output patterns are used in clustering, this leads to supervised clustering. • The locations of the cluster centers are determined by both the input pattern spread and the output pattern deviations. • Examples of supervised clustering include the LVQ family, the ARTMAP family, the conditional FCM, the supervised C-means, and the C-means plus k-NN based clustering.

  15. Clustering using non-Euclidean distance measures • Mahalanobis distance • It can be used to look for hyper-ellipsoid shaped clusters. • Special distance • It is applied for those detecting circles and hyper-spherical shells methods such as extensions of the C-means and FCM algorithms.

  16. Hierarchical clustering • It consists of a sequence of partitions in a hierarchical structure, which can be represented as a clustering tree called dendro-gram. • Hierarchical clustering takes the form of either agglomerative (bottom up) or divisive (top-down) technique. • Agglomerative • starts from N clusters, each containing one data point. A series of nested merging is performed until all the data points are grouped into one cluster. • Density based clustering • Groups objects of a data set into clusters based on density conditions. • It can handle outliers and discover clusters of arbitrary shape.

  17. Constructive clustering techniques • Conventional partition clustering algorithms • Selecting the appropriate value of K is a difficult task without a prior knowledge of the input data. • Self-creating mechanism in the competitive learning process can adaptively determine the natural number of clusters. • The self-creating and organizing neural network (SCONN) • For a new input, the winning node is updated if it is active; otherwise a new node is created from the winning node. • Growing Neural Gas(GNG) • It is capable of generating and removing neurons and lateral connections dynamically. • It achieves robustness against noise and performs perfect topology-preserving mapping.

  18. Miscellaneous clustering methods • Expectation maximization (EM) • Represents each cluster using a probability distribution, typically a Gaussian distribution. • It is derived by maximizing the log likelihood of the probability density function of the mixture model. • It is treated as a fuzzy clustering technique.

  19. Under-utilization problem • A initialization problem, since some prototypes, called dead units, may never win the competition. • Solutions • Conscience strategy • The frequent winner receives a bad conscience by adding a penalty term to its distance from the input signal. • Distortion measure • Multiplicatively biased competitive learning (MBCL) • The bias of winner will be added, lower bias is considered in determining winner.

  20. Shared clusters • The problem is considered in the rival- penalized competitive learning (RPCL) algorithm. • A rival penalizing force • For each input, the second-place winner called the rival is also updated by a smaller learning rate along the opposite direction. • Over-penalization or under-penalization problem • The STAR C-means has a mechanism similar to the RPCL, but penalizes the rivals in an implicit way.

  21. Winner-take-most rule • It relaxes the WTA rule by allowing more than one neuron as winners to a certain degree. • Soft competitive learning • Such as the SOM (Kohonen, 1989), the NG (Martinetz et al., 1993), the GNG (Fritzke, 1995a). • The maximum-entropy clustering applies

  22. Outliers • Outliers in a data set affects the result of clustering. • Solutions • Noise clustering • All outliers are collected into a separate, amorphous noise cluster. If a noisy point is far away from all the K clusters, it is attracted to the noise cluster. • Possibilistic C-means • PCM can find those natural clusters in the data set. • When K is smaller than the number of actual clusters, only K good clusters are found, and the other data points are treated as outliers. • When K is larger than the number of actual clusters, all the actual clusters can be found and some clusters will coincide. • In the noise clustering, there is only one noise cluster, while in the PCM there are K noise clusters.

  23. Cluster validity • Measures based on maximal compactness and maximal separation of clusters • Measures based on minimal hyper-volume and maximal density of clusters • A good partitioning of the data usually leads to a small total hypervolume and a large average density of the clusters.

  24. Computer simulations

  25. Summary • A state-of-the-art survey and introduction to neural network based clustering were provided in this paper.

  26. Comments • Advantage • This paper reviewed state-of-the-art clustering methods based on competitive learning. • Several effective solutions which overcome those problems occurred in clustering methods were mentioned as well. • Drawback • There are few performance indices for clustering validity were introduced in this paper. • Application • Clustering methods.

More Related