1 / 58

Outline

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John Sum Student: Allen Liang. Outline. Introduction Learning Algorithms Experiments Conclusion. 2. Introduction. Background.

heaton
Télécharger la présentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John SumStudent: Allen Liang

  2. Outline • Introduction • Learning Algorithms • Experiments • Conclusion 2

  3. Introduction

  4. Background • Neural network (NN) is a network system composed of interconnected neurons. • Learning aims to make a NN achieving good generalization (small prediction error).

  5. Fault tolerant is an unavoidable issue that must be considered in hardware implementation. • Multiplicative weight noise or additive weight noise. • Weight could be randomly breaking down. • Hidden node could be out of work (stuck-at-zero & stuck-at-one). • To have network still workable with graceful degradation in the presence of noise/faults.

  6. Weight noise injection during training • Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP • By simulation: convergence, fault tolerance • By theoretical analysis: effect of weight noise on the prediction error of a MLP • A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993. • A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994.

  7. Weight noise injection during training (Cont.) • Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN • By simulation: convergence and generalization • By theoretical analysis: effect of weight noise on the prediction error of a RNN • Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.

  8. Regularization • Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized. • Online learning algorithm is developed by the idea of gradient descent • No noise is injected during training • J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000 • J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000

  9. Regularization (Cont.) • Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function • Similar to Bernier et al approach. But, the weighting factor for the regularizer can be determined by the noise variance • Online learning is developed by the idea of gradient descent • No noise is injected during training • J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.

  10. Misconception • Ho, Leung, & Sum (2009-): Convergence? • Show that the work by G. An (1996) is incomplete. • Essentially, his work is identical to the works done by Murray & Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived. • By theoretical analysis, injecting weight noise during training a RBF has no use. • By simulation, MSE converges but weights might not converge. • Injecting weight noise and weight decay during training can improve convergence • K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press. • K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009. • J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.

  11. Objective • Investigate the fault tolerance and convergence of a NN that is trained by the method of • combining weight noise injection and adding weight decay during BPA training. • Compared the results with the NN being trained by • BPA training • weight noise injection during BPA training • adding weight decay during BPA training • Focus on multiple layer perceptron (MLP) network • Multiplicative and additive weight noise injections

  12. Learning Algorithms • BPA for linear output MLP (BPA1) • BPA1 with weight decay • BPA for sigmoid output MLP (BPA2) • BPA2 with weight decay • Weight noise injection training algorithms

  13. BPA 1 • Data set: • Hidden node output: • MLP output: • ps. 13

  14. BPA 1 (Cont.) • Objective function: • Update equation: • For j = 1, ... , n 14

  15. BPA 1 with weight decay • Objective function: • Update equation: • For j = 1, ... , n 15

  16. BPA 2 • Data set: • Hidden node output: • MLP output: • where • ps. 16

  17. BPA 2 (Cont.) • Objective function: • Update equation: • For j = 1, ... , n 17

  18. BPA 2 with weight decay • Objective function: • Update equation: • For j = 1, ... , n 18

  19. Weight noise injection training algorithms • Update equation: • For multiplicative weight noise injection • For additive weight noise injection 19

  20. Experiments • Data sets • Methodology • Results

  21. Date sets

  22. 2D mapping

  23. Mackey-Glass

  24. NAR

  25. Astrophysical Data

  26. XOR

  27. Character Recognition

  28. Methodology • Training • BPA • BPA with weight noise injection • BPA with adding weight decay • BPA with weight noise injection with weight decay • Fault tolerance • MWNI-based training: effect of multiplicative weight noise on the prediction error of the trained MLP • AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP • Convergence of the weight vectors

  29. Methodology

  30. 2D mapping (MWN)

  31. 2D mapping (MWN)

  32. 2D mapping (AWN)

  33. 2D mapping (AWN)

  34. Mackey-Glass (MWN)

  35. Mackey-Glass (MWN)

  36. Mackey-Glass (AWN)

  37. Mackey-Glass (AWN)

  38. NAR (MWN)

  39. NAR (MWN)

  40. NAR (AWN)

  41. NAR (AWN)

  42. Astrophysical (MWN)

  43. Astrophysical (MWN)

  44. Astrophysical (AWN)

  45. Astrophysical (AWN)

  46. XOR (MWN)

  47. XOR (MWN)

  48. XOR (AWN)

  49. XOR (AWN)

  50. Character recognition (MWN)

More Related