RPROP Resilient Propagation

RPROPResilient Propagation Students: Novica Zarvic Roxana Grigoras Term: Winter semester 2003/2004 Lecture: Machine Learning / Neural Networks Course: Information Engineering Date: 2004-01-06

Content Part I: General remarks Foundations (MLP, Supervised Learning, Backpropagation and its problems) Part II: Description of the RProp Algorithm Example cases Part III: Visualization with SNNS Discussion -02-

I. General remarks Basis for this talk: Rprop – Description and Implementation Details (Technical report by Martin Riedmiller, January 1994) URL:http://lrb.cs.uni-dortmund.de/~riedmill/publications/rprop.details.ps.Z -03-

I. w11 w11 w12 w11 w21 w12 w21 w22 w12 w22 w31 w32 MLPMulti-Layer Perceptron Output layer Input layer Hidden layer(s) Topology of a typical feed-forward network with two hidden layers. The external input is presented to the input layer, propagated forward through the hidden layers and yields an output activation vector in the output layer. -04-

I. Supervised Learning Objective: Tune the weights in the network such that the network performs a desired mapping of input to output activations. -05-

I. Principleof supervised learning (like BP or one of its derivatives) Presentation of the input pattern through activation of the input units. The pattern set consists of input activation vector xp and a target vector tp. Feedforward computation to get the resulting output vector sp. Compare sp with tp. Distance between the vectors is measured by the function E:= ½ ∙ ∑p ∑n | tp – sp|2. (n = number of units in output layer, p = a pattern pair of the pattern set P) Backpropagation of the errors from the output to the input changes the weights of the connections. This minimizes the error vector. Changing the weights of all neurons with the previous calculated values. -06-

I. Problems of Backpropagation → No information about the complete error function. It is difficult to choose a ‚good‘ learning rate. a. Local Minima of E b. Plateaus c. Oscillation d. Leaving good Minima → It uses only weight-specific information (partial derivative) to adapt weight-specific parameters. -07-

II. RPROPResilient Propagation What is the traditional Backpropagation algorithm doing? → It modifies the weights of the partial derivatives. (∂E/ ∂wij) → Problem: The size of this differential does not really represent the size of the necessary modification of the weight changes. → Solution: RProp does not count on the value of the partial derivative. It considers only the sign of the derivative to indicate the direction of the weight update. -08-

II. RPROP-Description- Effective learning scheme It performs a direct adaption of weight step based on local gradient information Basic principle of RProp is to eliminate the harmful influence of the size of the partial derivative on the weight step It considers only the sign of the derivative to indicate the direction of the weight update. -09-

II. RPROPResilient Propagation -10-

II. RPROPWhat is ∆ij ? ∆ij is an ‚update value‘. The size of the weight change is exclusively determined by this weight-specific ‚update value‘. ∆ij evolves during the learning process based on its local sight on the errorfunction E, according to the following learning-rule: -11-

II. RPROP The ‚weight update‘ ∆wijfollows a simple rule: If the derivative is positive (increasing error), the weight is decreased by its ‚update value‘. If the derivative is negative, the update value is added. -12-

II. RPROPOne exception („Take bad steps back!“) If the partial derivative changes sign, i.e. the previous step was too large and the minimum was missed, the previous ‚weight update‘ is reverted. -13-

II. RPROP-The pseudo code- -14-

II. RPROP-Settings- Increasing and decreasing factors: η- = 0.5 (decrease factor) η+= 1.2 (increase factor) Limits: ∆max = 50.0 (upper limit) ∆min = 1e-6 (upper limit) Initial value: ∆o = 0.1 (default setting) -15-

III. RPROPBackprop vs. RProp -16-

III. RPROP-Discussion- Compared to all other algorithms, only the sign of the differential is used to perform learning and adaptation. The size of the derivative decreases exponentially with the distance between the weight and the output-layer. Using RProp the size of the weight-step is dependent only on the sequence of signs  learning is spread equally all over the entire network. -17-

III. RPROP-Further material- Advanced Supervised Learning in Multi-layer Perceptrons – From Backpropagation to Adaptive Learning Algorithms (Martin Riedmiller) A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm (Martin Riedmiller) Rprop – Description and Implementation Details (Martin Riedmiller) -18-

III. RPROPResilient Propagation Thank you for listening!  -19-

RPROP Resilient Propagation

RPROP Resilient Propagation

Presentation Transcript

Plant Propagation ASEXUAL PROPAGATION

Resilient Children

RESILIENT NEWCASTLE

Resilient Tourism

SEED PROPAGATION

Asexual Propagation

Propagation

Resilient

Propagation Characteristics

Resilient Leadership

Digitally Resilient

Propagation

Plant Propagation – Asexual Propagation

Resilient dCache

Resilient People,

Digitally Resilient

Resilient Planet

Resilient Technologies

Supporting Resilient Families within Resilient Communities

Wireless Propagation