1 / 27

Longin Jan Latecki Temple University latecki@temple

Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective .  CRC 2009 based on slides from many Internet sources. Longin Jan Latecki Temple University latecki@temple.edu. Perceptorn. In RBFN. architecture. h 1. x 1. W 1. h 2. x 2. W 2. h 3. x 3.

Télécharger la présentation

Longin Jan Latecki Temple University latecki@temple

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch. 4: Radial Basis FunctionsStephen Marsland, Machine Learning: An Algorithmic Perspective.  CRC 2009based on slides from many Internet sources Longin Jan Latecki Temple University latecki@temple.edu

  2. Perceptorn

  3. In RBFN

  4. architecture h1 x1 W1 h2 x2 W2 h3 x3 W3 f(x) Wm hm xn Input layer Hidden layer Output layer

  5. architecture Three layers • Input layer • Source nodes that connect to the network to its environment • Hidden layer • Hidden units provide a set of basis function • High dimensionality • Output layer • Linear combination of hidden functions

  6. architecture Radial basis function m f(x) =  wjhj(x) j=1 hj(x)= exp( -(x-cj)2 / rj2 ) Where cj is center of a region, rj is width of the receptive field

  7. Function Approximation with Radial Basis Functions RBF Networks approximate functions using (radial) basis functions as the building blocks.

  8. Exact Interpolation • RBFs have their origins in techniques for performing exact function interpolation [Bishop, 1995]: • Find a function h(x) such that h(xn) = tnn=1, ... N • Radial Basis Function approach (Powel 1987): • Use a set of N basis functions of the form f(||x-xn||), one for each point,where f(.) is some non-linear function. • Output: h(x) = Sn wnf(||x-xn||)

  9. Exact Interpolation • Goal (exact interpolation): • Find a function h(x) such that h(xn) = tnn=1, ... N • Radial Basis Function approach (Powel 1987): • Use a set of N basis functions of the form f(||x-xn||), one for each point,where f(.) is some non-linear function. • Output: h(x) = Sn wnf(||x-xn||) w1f(||x1-x1||) + w2f(||x1-x2||) + ... + wNf(||x1-xN||) = t1 w1f(||x2-x1||) + w2f(||x2-x2||) + ... + wNf(||x2-xN||) = t2 fW = T ... w1f(||xN-x1||) + w2f(||xN-x2||) + ... + wNf(||xN-xN||)= tN

  10. Exact Interpolation

  11. Exact Interpolation

  12. Due to noise that may be present in the data exact interpolation is rarely useful. • By introducing a number of modifications, we arrive at RBF networks: • Complexity rather than the size of the data is what is important • Number of the basis functions need not be equal to N • Centers need not be constrained by the input • Each basis function can have its own adjustable width parameter s • Bias parameter may be included in the linear sum.

  13. Illustrative Example - XOR Problem f1 f2

  14. Function Approximation via Basis Functions and RBF Networks • Usingnonlinear functions, we can convert a nonlinearly separable problem • into a linearly separable one. • From a function approximation perspective, this is equivalent toimplementing a complex function (corresponding to the nonlinearlyseparable decision boundary) using simple functions (corresponding to thelinearly separable decision boundary) • Implementing this procedure using a network architecture, yields the RBF • networks, if the nonlinear mapping functions are radial basis functions. • RadialBasis Functions: • Radial: Symmetric around its center • Basis Functions: Also called kernels, a set of functions whose linear combinationcan generate an arbitrary function in a given function space.

  15. RBF Networks

  16. RBF Networks

  17. Network Parameters • Whatdo these parameters represent? • • f: The radial basis function for the hidden layer. • This is a simple nonlinear mappingfunction (typically Gaussian) that transforms the d- dimensional input patterns to a(typically higher) H-dimensional space. The complex decision boundary will beconstructed from linear combinations (weighted sums) of these simple buildingblocks. • • uJi: The weights joining the first to hidden layer. These weights constitute the centerpoints of the radial basis functions. Also called prototypes of data. • • s: The spread constant(s). These values determine the spread (extend) of each radialbasis function. • • Wjk: The weights joining hidden and output layers. These are the weights which areused in obtaining the linear combinationof the radial basis functions. Theydetermine the relative amplitudes of the RBFs when they are combined to form thecomplex function. • • ||x-uJ||: the Euclidean distance between the input x and the prototype vector uJ.Activation of the hidden unit is determined according to this distance through f.

  18. Training RBF Networks Approach 1: Exact RBF Approach 2: Fixed centers selected at random Approach 3: Centers are obtained from clustering Approach 4: Fully supervised training

  19. Training RBF Networks • Approach1: Exact RBF • Guarantees correct classification of all trainingdata instances. • Requires N hidden layer nodes, one for each traininginstance. • No iterative training is involved: w are obtained bysolving a set of linear equations • Non-smooth, bad generalization

  20. Exact Interpolation

  21. Exact Interpolation

  22. Too Many Receptive Fields? • In order to reduce the artificial complexity of the RBF, we need to use fewer number of receptive fields. • Approach 2: Fixed centers selected at random. • Use M < N data points as the receptive field centers. • Fast but may require excessive centers • Approach 3: Centers are obtained from unsupervised learning(clustering). • Centers no longer has to coincide with data points • Thisis the most commonly used procedure, providing good results.

  23. Approach 2 Approach 3 Approach 3.b

  24. Determining the Output Weightsthrough learning (LMS)

  25. RBFs for Classification

  26. Homework • Problem 4.1, p. 117 • Problem 4.2, p. 117

More Related