1 / 29

Section 2: On-line Learning

תשס״ד בר־ אילן אוניברסיטת המוח לחקר ברשתות המרכז הרב תחומי מרוכז קורס. Section 2: On-line Learning. Based on slides from Michael Biehl’s summer course. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004.

beau-malone
Télécharger la présentation

Section 2: On-line Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Section 2: On-line Learning Based on slides from Michael Biehl’s summer course Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  2. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Section 2.1: The Perceptron Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  3. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס The Perceptron Input:  Adaptive Weights J Output: S Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  4. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס W Perceptron: binary output Implements a linearly separable classification of inputs Milestones: Perceptron convergence theorem, Rosenblatt (1958) Capacity, winder (1963) Cover(1965) Statistical Physics of perceptron weights, Gardner (1988) How does this device learn? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  5. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning a linearly separable rule from reliable examples • Unknown rule: ST()=sign(B) =±1 Defines the correct classification. Parameterized through a teacher perceptron with weights BRN, (BB=1) • Only available information: example data D= {, ST()=sign(B) for =1…P } Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  6. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning a linearly… (Cont.) • Training: finding the student weights J • J parameterizes a hypothesis SS()=sign(J) • Supervised learning is based on the student performance with respect to the training data D • Binary error measure T(J)= [SS(),ST()] T(J)=0 if SS()ST() T(W)=1 if SS()=ST() Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  7. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Off-line learning • Guided by the minimization of a cost function H(J), e.g., the training error H(J) tT(J) Equilibrium statistical mechanics treatment: • Energy H of N degrees of freedm • Ensemble of systems is in thermal equilibrium at formal temperature • Disorder avg. over random examples (replicas) assumes distribution over the inputs • Macroscopic description, order parameters • Typical properties of large sustems, P= N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  8. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס On-line training • Single presentation of uncorrelated (new) {,ST()} • Update of student weights: • Learning dynamics in discrete time Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  9. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס On-line training - Statistical Physics approach • Consider sequence of independent, random • Thermodynamic limit • Disorder average over latest example self-averaging properties • Continuous time limit Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  10. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Generalization Performance of the student (after training) with respect to arbitrary, new input • In practice: empirical mean of mean error measure over a set of test inputs • In the theoretical analysis: average over the (assumed) probability density of inputs Generalization error: Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  11. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Generalization (cont.) The simplest model distribution: Isotropic density P(),  uncorrelated with B and J Consider vectors of independent identically distributed (iid) components j with Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  12. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס B J  Geometric argument Projection of data into (B, J)-plane yields isotropic density of inputs g=/ ST()=SS() For |B|=1 Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  13. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Overlap Parameters Sufficient to quantify the success of learning R=BW Q=JJ Random guessing R=0, g=1/2 Perfect generalization , g=0 Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  14. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Derivation for large N Given B, J, and uncorrelated random input i=0, i j =ij, consider student/teacher fields that are sums of (many) independent random quantities: x=J=∑iJiI y=B=∑iBii Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  15. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Central Limit Theorem Joint density of (x,y) is for N→∞, a two dimensional Gaussian, fully specified by the first and the second moments x=∑iJii=0y=∑iBii=0 x2 = ∑ijJiJjij = ∑iJi2 = Q y2 = ∑ijBiBjij = ∑iBi2 = 1 xy = ∑ijJiBjij = ∑iJiBi = R Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  16. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Central Limit Theorem (Cont.) Details of the input are irrelevant. Some possible examples: binary, i1, with equal prob. Uniform, Gaussian. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  17. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Generalization Error The isotropic distribution is also assumed to describe the statistics of the example data inputs Exercise: Derive the generalization error as a function of R,Q use Mathematical notes Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  18. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Assumptions about the data • No spatial correlatins • No distinguished directions in the input space • No temporal correlations • No correlations with the rule • Single presentation without repeatitions Consequences: • Average over data can be performed step by step • Actual choice of B is irrelevant, it is not necessary to averaged over the teacher Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  19. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebbian learning (revisited) Hebb 1949 • Off-line interpretation Vallet 1989 Choice of student weights given D={,ST}=1P J(P)= ∑ST/N • Equivalent On-line interpretation Dynamics upon single presentation of examples J()= J(-1) + ST/N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  20. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line From microscopic to macroscopic: recursions for overlaps Exercise: Derive the update equations of R,Q Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  21. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line (Cont.) Average over the latest example … The random input, enters only through the fields The random input  and J(-1), Bare statistically independent The Central Limit Theorems applies and obtains the joint density Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  22. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line (Cont.) Exercise: Derive the update equations of R,Q as a function of  use Mathematical notes Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  23. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line (Cont.) Continuous time limit, N→∞, = /N, d=1/N Initial conditions - tabula rasa R(0)=Q(0)=0 What are the mean values after training with N examples??? [See matlab code] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  24. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line mean values Self average properties of A(J): • The width of the distribution vanishes • The observation of a value of A different from its mean occurs with vanishing probability The order parameters, Q and R, are self averaging for infinite N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  25. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning Curve:  dependent of the order parameters The normalized overlap between the two vectors, B, J provides the angle between the vectors two vectors Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  26. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning Curve:  dependent of the order parameters Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  27. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Asymptotic expansion [draw w. matlab] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  28. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Modified Hebbian learning The training algorithm is defined by a modulation function f J()= J(-1) +f(…) ST/N Restriction: f may depend on available quantities: f(J(-1),,ST) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  29. תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Questions: • Is the perceptron algorithm Rosenblatt 1959, that learns only when there is a mistake performs better than the Hebb algorithm? • What training algorithm will provide the best learning/ the fastest asymptotic decrease? • Is it possible to achieve an asymptotic behavior, on-line? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

More Related