1 / 32

Dynamic Self-Organizing Maps with Controlled Growth for Knowledge Discovery

Dynamic Self-Organizing Maps with Controlled Growth for Knowledge Discovery. Authors: Alahakoonand Halgamuge Advisor: Dr. Hsu Graduate: Yu-Wei, Su. Outline. Motivation Objective Introduction

geraldcraig
Télécharger la présentation

Dynamic Self-Organizing Maps with Controlled Growth for Knowledge Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Self-Organizing Maps with Controlled Growth for Knowledge Discovery Authors: Alahakoonand Halgamuge Advisor: Dr. Hsu Graduate: Yu-Wei, Su Intelligent Database System Lab,IDSL

  2. Outline • Motivation • Objective • Introduction • Self-generating feature maps for data mining • GSOM algorithm • Advantages of GSOM over others • Knowledge discovery by hierarchical clustering of GSOM • Experiment • Conclusion • Opinion Intelligent Database System Lab,IDSL

  3. Motivation • Predetermination of size and number of nodes in SOM results in a significant limitation on final mapping • The limitation let the user not being aware the results of structure presentation Intelligent Database System Lab,IDSL

  4. Objective • To determine the shape as well as the size of the network during the training of the network Intelligent Database System Lab,IDSL

  5. Introduction • SOM in its original form does not provide complete topology preservation theoretically • The completion of the simulation that a different sized network would have been more appropriate for the application • SOM have to predetermine the size and number of nodes results a signification limitation Intelligent Database System Lab,IDSL

  6. Introduction( cont.) • To determine the shape as well as the size of the network during the training of the network • The need for a measure for controlling the growth of the GSOM is highlighted Intelligent Database System Lab,IDSL

  7. Self-generating feature maps for data mining • Growing Cell Structures (GCS’s) [Fritzke, 1991] • Neural Gas Algorithm [Martinetz and Shulten, 1991] • Incremental Grid Growing (IGG) [Blackmore,1995] Intelligent Database System Lab,IDSL

  8. GSOM algorithm • GSOM is initialized with four nodes and grows nodes to represent the input data • Weight values of the nodes are self-organizing according to a similar method as the SOM Intelligent Database System Lab,IDSL

  9. GSOM algorithm( cont.) • Initialization phase • Initialize the weight vectors of the nodes with random numbers • Calculate the growth threshold (GT) according to the user requirements • Growing phase Intelligent Database System Lab,IDSL

  10. GSOM algorithm( cont.) • Increase the error value of the winner • When > GT. Grow nodes if i is a boundary node. IF then Else remains unchangeed Intelligent Database System Lab,IDSL

  11. GSOM algorithm( cont.) • New node generation • If a node is selected for growth , all its free neighboring positions will be grown new nodes • Weight initialization of new nodes If Then If Then Intelligent Database System Lab,IDSL

  12. GSOM algorithm( cont.) If Then If Then Wnew=m, m=(r1+r2)/2 and r1,r2 being the lower and upper value of the range of the Weight vector distribution Intelligent Database System Lab,IDSL

  13. GSOM algorithm( cont.) • Distribute weights to neighbors if i is a nonboundary node Intelligent Database System Lab,IDSL

  14. GSOM algorithm( cont.) • Distribute the error to the neighboring nodes • Initialize the learning rate (LR) to its starting value • Repeat steps until all inputs have been presented and node growth is reduced to a minimum level • Smoothing phase • to occur after the new node growing phase γis factor of distribution (FD), 0<FD<1 Intelligent Database System Lab,IDSL

  15. GSOM algorithm( cont.) • The purpose is to smooth out any existing quantization error • No new node are adds during this phase • LR in this phase is less than the growing phase, since the weight values should not fluctuate too much Intelligent Database System Lab,IDSL

  16. GSOM algorithm( cont.) • The neighborhood is constrained only to the immediate neighborhood • No node growth Intelligent Database System Lab,IDSL

  17. Advantages of GSOM over others • Learning rate adaptation • In GSOM due to the small number of nodes at the beginning, this causes a problem • The problem can be improved when the input data are ordered by not available in unsupervised learning • The solution LR(t+1)=LR(t) x α, 0<α<1 LR(t+1)=α x ψ(n) x LR(t) φ(n) can be used is (1-R/n(t)) Intelligent Database System Lab,IDSL

  18. Advantages of GSOM over others( cont.) • Localized neighborhood weight adaptation • During SOM training, the neighborhood is large and shrink linearly to one node • The GSOM does not require, since new weight nodes are initialized to fit in with the existing neighborhood weights • The GSOM just requires s small neighborhood • Therefore , during growing phase , the GSOM initializes the LR and Nk to a starting value at each new input Intelligent Database System Lab,IDSL

  19. Advantages of GSOM over others( cont.) • Error distribution of nonboundary nodes • the weight distribution produce an effect of spreading the error outwards from the high error node • Making ripple effect outwards and cause a boundary node to increase its error value Intelligent Database System Lab,IDSL

  20. Knowledge discovery by hierarchical clustering of GSOM • The spread-out factor (SF) • GT is the threshold to decide when to initiate new node growth • A large GT will result in a map with a fewer number of nodes and this is an abstract picture of data and vice versa Intelligent Database System Lab,IDSL

  21. Knowledge discovery by hierarchical clustering of GSOM( cont.) • But TE is sensitive to dimension of data and number of hits and it is hard to decide GT • The SF can be used to control and calculate the GT GT=D x f(SF) , 0<SF<1 Intelligent Database System Lab,IDSL

  22. Knowledge discovery by hierarchical clustering of GSOM( cont.) Intelligent Database System Lab,IDSL

  23. Knowledge discovery by hierarchical clustering of GSOM( cont.) Intelligent Database System Lab,IDSL

  24. Experiment • The spread of the GSOM with increasing SF values • Zoo data set with 18 attributes and 99 tuples Intelligent Database System Lab,IDSL

  25. Experiment( cont.) insects Intelligent Database System Lab,IDSL

  26. Experiment( cont.) fish Airbone bird Nonpredatory mammal Nondomestic mammal noneAirbone bird Intelligent Database System Lab,IDSL

  27. Experiment( cont.) • Hierarchical clustering of interesting clusters Small size Without tail Intelligent Database System Lab,IDSL

  28. Experiment( cont.) • The GSOM for high-dimensional human genetic data set • With 43 dimension • Genetic information is derived from blood samples • Genetic distance between the population with a measure Fst • Fst uses a form of normalization to account for frequencies that are not normally distributed Intelligent Database System Lab,IDSL

  29. Experiment( cont.) Intelligent Database System Lab,IDSL

  30. Experiment( cont.) Intelligent Database System Lab,IDSL

  31. Conclusion • The shape of the GSOM represents the grouping in the data and has a better attracting attention for further investigation • The number of nodes required less than SOM and results in faster processing • A hierarchical clustering make more detail data investigation Intelligent Database System Lab,IDSL

  32. Opinion • Quantization error is a popular factor in SOM • This paper is a classical paper • Does error distribution in nonboundary make the destruction of topology preservation? Intelligent Database System Lab,IDSL

More Related