Back-propagation

Back-propagation Chih-yun Lin 6/6/2014

Agenda • Perceptron vs. back-propagation network • Network structure • Learning rule • Why a hidden layer? • An example: Jets or Sharks • Conclusions

Network Structure –Perceptron O Output Unit Wj IjInput Units

Network Structure –Back-propagation Network Oi Output Unit Wj,i ajHidden Units Wk,j Ik Input Units

Learning Rule • Measure error • Reduce that error • By appropriately adjusting each of the weights in the network

Learning Rule –Perceptron • Err = T–O • O is the predicted output • T is the correct output • WjWj+ α * Ij* Err • Ij is the activation of a unit j in the input layer • α is a constant called the learning rate

Learning Rule –Back-propagation Network • Erri = Ti–Oi • Wj,iWj,i+ α * aj * Δi • Δi = Erri * g’(ini) • g’ is the derivative of the activation function g • ajis the activation of the hidden unit • Wk,jWk,j+ α * Ik * Δj • Δj = g’(inj) * ΣiWj,i* Δi

Learning Rule –Back-propagation Network • E = 1/2Σi(Ti–Oi)2 • = - Ik * Δj

Why a hidden layer? • (1 w1) + (1 w2) < ==> w1 + w2 < • (1 w1) + (0 w2) > ==> w1 > • (0 w1) + (1 w2) > ==> w2 > • (0 w1) + (0 w2) < ==> 0 <

Why a hidden layer? (cont.) • (1 w1) + (1 w2) + (1 w3) < ==> w1 + w2 + w3 < • (1 w1) + (0 w2) + (0 w3) > ==> w1 > • (0 w1) + (1 w2) + (0 w3) > ==> w2 > • (0 w1) + (0 w2) + (0 w3) < ==> 0 <

An example: Jets or Sharks

Conclusion • Expressiveness: • Well-suited for continuous inputs,unlike most decision tree systems • Computational efficiency: • Time to error convergence is highly variable • Generalization: • Have reasonable success in a number of real-world problems

Conclusions (cont.) • Sensitivity to noise: • Very tolerant of noise in the input data • Transparency: • Neural networks are essentially black boxes • Prior knowledge: • Hard to used one’s knowledge to “prime” a network to learn better

Back-propagation