1 / 46

Quadratic Perceptron Learning with Applications

Quadratic Perceptron Learning with Applications. Tonghua Su National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Beijing, PR China Dec 2, 2010. Outline. Introduction Motivations Quadratic Perceptron Algorithm Previous works Theory perspective

ince
Télécharger la présentation

Quadratic Perceptron Learning with Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quadratic Perceptron Learning with Applications Tonghua Su National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Beijing, PR China Dec 2, 2010

  2. Outline • Introduction • Motivations • Quadratic Perceptron Algorithm • Previous works • Theory perspective • Practical perspective • Open issues • Conclusions

  3. 1 Introduction Notation, binary classification, multi-class classification, large scale learning vs large category learning

  4. Introduction • Domain Set • Label Set • Training Data • Binary Classification • e.g. linear model

  5. Introduction • Multi-class Classification • Learning strategy • One vs one • One vs all • Single machine • e.g. Linear model • Large-category classification • Chinese character recognition (3,755 classes) • More confusions between classes

  6. Introduction • Large Scale Learning • large numbers of data points • high dimensions • Challenge in computation resource • Large Category vs Large Scale • Almost certainly: large category  large scale • Tradeoffs: efficiency vs accuracy

  7. 2 Motivations MQDF HMMs

  8. Modified Quadratic Discriminant Function (MQDF) • QDF • MQDF [Kimura et al ‘1987] • Using SVD, • Truncate small eigenvalues

  9. Modified Quadratic Discriminant Function (MQDF) • MQDF+MCE+Synthetic samples [Chen et al ‘2010] • Building block: discriminative learning of MQDF

  10. 0.05 0.95 0.95 F L 0.05 S F F F L L F F E Hidden Markov Models (HMMs) • Markovian transition + state specific generator • Continuous density HMMs: each state emits a GMM • e.g. Usable in handwritten Chinese text recognition [Su ‘2007]

  11. Hidden Markov Models (HMMs) • Perceptron training of HMMs [Cheng et al ’2009] • Joint distribution • Discriminant function log p(s,x) • Perceptron training • Nonnegative-definite constraint • Lack of theoretical foundation

  12. 3 Quadratic Perceptron Algorithm Related works Theoretical considerations Practical considerations Open issues

  13. Previous Works • Rosenblatt’s Perceptron [Rosenblatt ’58] • Updating rule:

  14. w0 w2 x2y2 x3y3 w2 x2y2 w1 x3y3 w4 w3 w1 Previous Works • Rosenblatt’s Perceptron wTx3y3=0 wTx2y2=0 _ + Solution Region wTx4y4=0 _ + + _ wTx1y1=0 + _

  15. Previous Works • Rosenblatt’s Perceptron [Rosenblatt ’58] • View from batch loss where • Using stochastic gradient decent (SGD)

  16. Previous Works • Convergence Theorem [Block ’62,Novikoff ’62] • Linearly separate data • Stop at most (R/)2 steps

  17. Previous Works • Voted Perceptron [Freund ’99] • Training algorithm Prediction:

  18. Previous Works • Voted Perceptron • Generalization bound

  19. Previous Works • Perceptron with Margin [Krauth ’87, Li ’2002]

  20. Previous Works • Ballseptron [Shalev-Shwartz ’2005]

  21. Learning Unlearning Previous Works • Perceptron with Unlearning [Panagiotakopoulos ’2010]

  22. Theoretical Perspective • Prediction rule • Learning

  23. Theoretical Perspective • Algorithm online version

  24. Theoretical Perspective • Convergence Theorem of Quadratic Perceptron (quadratic separable)

  25. Theoretical Perspective • Convergence Theorem of Quadratic Perceptron with Magin (quadratic separable)

  26. Theoretical Perspective • Bounds for quadratic inseparable case

  27. Theoretical Perspective • Generalization Bound

  28. Theoretical Perspective • Nonnegative-definite constraints • Projection to the valid space • Restriction on updating • Convergence holds

  29. Theoretical Perspective • Toy problem: Lithuanian Dataset • 4000 training instances • 2000 test instances

  30. Theoretical Perspective • Perceptron learning (toy problem)

  31. Theoretical Perspective • Extension to Multi-class QDF

  32. Theoretical Perspective • Extension to Multi-class QDF • Theoretical property holds as binary QDF • Proof can be completed using Kesler’s construction

  33. Practical Perspective • Practical Perspective • Perceptron batch loss where • SGD

  34. Practical Perspective • Practical Perspective • Constant margin • Dynamic margin

  35. Practical Perspective • Experiments • Benchmark on digit databases

  36. Practical Perspective • Experiments • Benchmark on digit databases grg on MNIST

  37. Practical Perspective • Experiments • Benchmark on digit databases grg on USPS

  38. Practical Perspective • Experiments • Effects of training size (grg on MNIST)

  39. Practical Perspective • Experiments • Benchmark on CASIA-HWDB1.1

  40. Practical Perspective • Experiments • Benchmark on CASIA-HWDB1.1

  41. Open Issues • Convergence on GMM/MQDF? • Error reduction on CASIA-DB1.1 is small • How about adding more data ? • Can label permutation help? • Speedup the training process • Evaluate on more datasets

  42. 4 Conclusions

  43. Conclusions • Theoretical foundation for QDF • Convergence Theorem • Generalization Bound • Perceptron learning of MQDF • Margin is need for good generalization • More data may help

  44. Thank you!

  45. References • [Chen et al ‘2010] Xia Chen, Tong-Hua Su,Tian-Wen Zhang. Discriminative Training of MQDF Classifier on Synthetic Chinese String Samples, CCPR,2010 • [Cheng et al ‘2009] C. Cheng, F. Sha, L. Saul. Matrix updates for perceptron training of continuous density hidden markov models, ICML, 2009. • [Kimura ‘87] F. Kimura, K. Takashina, S. Tsuruoka, Y. Miyake. Modified quadratic discriminant functions and the application to Chinese character recognition, IEEE TPAMI, 9(1): 149-153, 1987. • [Panagiotakopoulos ‘2010] C. Panagiotakopoulos, P. Tsampouka. The Margin Perceptron with Unlearning, ICML, 2010. • [Krauth ‘87] W. Krauth and M. Mezard. Learning algorithms with optimal stability in neural networks. Journal of Physics A, 20, 745-752, 1987. • [Li ‘2002] Yaoyong Li, Hugo Zaragoza, Ralf Herbrich, John Shawe-Taylor, Jaz Kandola. The Perceptron Algorithm with Uneven Margins, ICML, 2002.

  46. References • [Freund ‘99] Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3): 277-296, 1999. • [Shalev-Shwartz ’2005] Shai Shalev-Shwartz, Yoram Singer. A New Perspective on an Old Perceptron Algorithm, COLT, 2005. • [Novikoff ‘62] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proc. Symp. Math. Theory Automata, Vol.12, pp. 615–622, 1962. • [Rosenblatt ‘58] Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65 (6):386–408, 1958. • [Block ‘62] H.D. Block. The perceptron: A model for brain functioning, Reviews of Modern Phsics, 1962, 34:123-135.

More Related