1 / 23

Dynamical analysis of LVQ type learning rules

Dynamical analysis of LVQ type learning rules. Barbara Hammer. Michael Biehl, Anarta Ghosh. Clausthal University of Technology Institute of Computing Science. Rijksuniversiteit Groningen Mathematics and Computing Science http://www.cs.rug.nl/~biehl m.biehl@rug.nl.

tariq
Télécharger la présentation

Dynamical analysis of LVQ type learning rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamical analysis of LVQ type learning rules Barbara Hammer Michael Biehl, Anarta Ghosh Clausthal University of Technology Institute of Computing Science Rijksuniversiteit Groningen Mathematics and Computing Science http://www.cs.rug.nl/~biehl m.biehl@rug.nl

  2. often: heuristically motivated variations of competitive learning • initialize prototype vectors for different classes example: basic LVQ scheme[Kohonen]: “LVQ 1” • present a single example   • identify the closest prototype, i.ethe so-calledwinner classification:    assignment of a vector  to the class of the closest prototype w  • move the winner -closertowards the data (same class)  -away from the data (different class) Learning Vector Quantization (LVQ) - identification of prototype vectors from labelled example data - parameterization of distance based classification schemes  aim: generalization ability classificationof novel data after learning from examples

  3. often based on heuristic arguments or cost functions with unclear relation to generalization here: analysis of LVQ algorithms w.r.t. - dynamics of the learning process - performance, i.e. generalization ability - typical properties in a model situation LVQ algorithms ... • frequently applied in a variety • of practical problems • plausible, intuitive, flexible • - fast, easy to implement • limited theoretical understanding of • - dynamics and convergence properties • - achievable generalization ability

  4. orthonormal center vectors: B+, B-∈ ℝN, ( B )2 =1, B+·B- =0 prior weights of classes p+,p- p+ + p- = 1 (p-) ℓ separation ∝ ℓ B- B+ (p+) Model situation: two clusters of N-dimensional data random vectors ∈ ℝN according to mixture of two Gaussians: ℝN independent components: with variance:

  5. learning rate, step size change of prototype towards or away from the current data update of two prototype vectors w+, w- : competition, direction of update etc. Dynamics of on-line training sequence of new, independent random examples drawn according to example: LVQ1, original formulation [Kohonen] Winner-Takes-All (WTA) algorithm

  6. 1. description in terms of a few characteristic quantitities ( here: ℝ2N  ℝ7) length and relative position of prototypes projections into the (B+, B-)-plane Mathematical analysis of the learning dynamics  recursions random vector ξμ enters only through its length and the projections

  7. characteristic quantities - depend on the random sequence of example data - their variance vanishes with N  (here: ∝ N-1) random vector according to : avg. length  averaged recursionsclosed in 2. average over the current example correlated Gaussian random quantities in the thermodynamic limit N   completely specified in terms of first and second moments 3. self-averaging property learning dynamics is completely described in terms of averages

  8. # of examples # of learning steps per degree of freedom stochastic recursions  deterministic ODE 4. continuous learning time integration yields evolution of projections 5. learning curve probability for misclassification of a novel example  generalization error εg(α)after training with α N examples

  9. initializationws(0)≈0 Q-- w+ ℓ B- w- Q++ RSσ ℓ B+ Q+- w+ α theory and simulation(N=100) p+=0.8, v+=4, v+=9, ℓ=2.0, =1.0 averaged over100indep. runs LVQ1: The winner takes it all only the winner is updated according to the class label 1 winnerws RS- RS+ Trajectories in the(B+,B- )-plane (•)=20,40,....140 ....... optimal decision boundary ____ asymptotic position

  10. η= 2.0 1.0 0.2  η Learning curve • suboptimal, non-monotonic • behavior for small η εg p+ = 0.2, ℓ=1.0 v+= v- = 1.0 - stationary state: εg (α∞) grows linearly withη -well-defined asymptotics: η 0, α∞, (ηα ) ∞ achievable generalization error: εg εg v+= v- =1.0 v+ =0.25 v-=0.81 .... best linear boundary ― LVQ1 p+ p+

  11. problem: instability of the algorithm due to repulsion of wrong prototypes trivial classification für α∞: εg = min { p+,p- } theory and simulation (N=100) p+=0.8, ℓ=1, v+=v-=1, =0.5 averages over 100 independent runs “LVQ 2.1“ [Kohonen] here:update correct and wrong winner RS- RS+

  12. εg η= 2.0, 1.0, 0.5  η suggested strategy: selection of data in a window close to the current decision boundary slows down the repulsion, system remains instable Early stopping: end training process at minimal εg (idealized) • pronounced minimum in εg (α) • depends on initialization and • cluster geometry • lowest minimum • assumed for η0 εg v+ =0.25 v-=0.81 ―LVQ1 __early stopping p+

  13. Learning curves: εg p+=0.8, ℓ=3.0 v+=4.0, v-=9.0 η= 2.0, 1.0, 0.5  η-independent asymptotic εg “Learning From Mistakes (LFM)” LVQ2.1 update only if the current classification is wrong crisp limit of Soft Robust LVQ [Seo and Obermayer, 2003] projected trajetory: RS- ℓ B- ℓ B+ RS+ p+=0.8, ℓ= 1.2, v+=v=1.0

  14. Comparison: achievable generalization ability v+=v-=1.0 v+=0.25 v-=0.81 equal cluster variances unequal variances εg p+ p+ ..... best linear boundary ―LVQ1 --- LVQ2.1 (early stopping) ·-·LFM

  15. Summary • prototype-based learning • Vector Quantization and Learning Vector Quantization • a model scenario: two clusters, two prototypes • dynamics of online training • comparison of algorithms: • LVQ 1 : close to optimal asymptotic generalization • LVQ 2.1. : instability, trivial (stationary) classification • + stopping : potentially very good performance • LFM : far from optimal generalization behavior • work in progress, outlook • multi-class, multi-prototype problems • optimized procedures: learning rate schedules • variational approach / Bayes optimal on-line

  16. Generalized Relevance LVQ [e.g. Hammer & Villmann] • adaptive metrics, e.g. distance measure training neighborhood preserving SOM Neural Gas (distance based) Perspectives • Self-Organizing Maps (SOM) • (many) N-dim. prototypes form a (low) d-dimensional grid • representation of data in a topology preserving map • applications

  17. Outlook:

  18. random vector according to : avg. length correlated Gaussian random quantities in the thermodynamic limit N   completely specified in terms of first and second moments (w/o indices μ):  averaged recursionsclosed in 2. average over the current example

  19. N   investigation and comparison of given algorithms • - repulsive/attractive fixed points of the dynamics • - asymptotic behavior for  • dependence on learning rate, separation, initialization • ... optimization and development of new prescriptions • - time-dependent learning rate η(α) • variational optimization w.r.t. fs[...] • - ... maximize

  20. initialization ws(0)=0 Q-- R++ (α=10) Q++ RSσ Q+- 1/N α theory and simulation(N=100) p+=0.8, v+=4, p+=9, ℓ=2.0, =1.0 averaged over 100 indep. runs self-averaging property (mean and variances) LVQ1: The winner takes it all only the winner is updated according to the class label 1 winner ws

  21. projections on two independent random directions w1,2 μ w ξ = × x 2 2 μ ξ = × y B - - μ w ξ μ × = ξ x = × y B + + 1 1 high-dimensional data (formally: N∞) ξμ∈ℝN , N=200, ℓ=1, p+=0.4, v+=0.44, v-=0.44 (● 240) (○ 160) projections into the plane of center vectors B+,B-

More Related