1 / 21

Statistical Language Modelling Part I – Observable Models

Statistical Language Modelling Part I – Observable Models. Simon Lucas. Summary. Applications The fundamentals Observable v. hidden (latent) models N-gram and scanning n-tuple models Incremental classifiers and LOO optimisation Evaluation methods Results Conclusions and further work.

paulos
Télécharger la présentation

Statistical Language Modelling Part I – Observable Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Language Modelling Part I – Observable Models Simon Lucas

  2. Summary • Applications • The fundamentals • Observable v. hidden (latent) models • N-gram and scanning n-tuple models • Incremental classifiers and LOO optimisation • Evaluation methods • Results • Conclusions and further work

  3. Statistical Language Models • Compute p(x|M) – the probability of a sequence x given the Model M • Java interface: public interface LanguageModel { public void train(SequenceDataset sd); public double p(int[] seq); }

  4. Sequence Dataset public interface SequenceDataset { public int nSymbols(); public int nSequences(); public int[] getSequence(int i); }

  5. Evaluating Language Models • Standard: • test set Perplexity • Preferred (by me!): • Recognition accuracy • Dictionary Extrapolation • Perplexity assumes all models are playing by the same rules • The other models make no such assumptions

  6. Distributed Mode Evaluation • Use Algoval evaluation server • Currently: http://ace.essex.ac.uk • Download the developer pack • Configure model – or write your own • Specify test parameters • Run tests • View results immediately on web site!

  7. Sequence Recognition • Given a statistical language model • Can easily deploy it for sequence recognition • Build a model for each class • Assign pattern to class with highest posterior • Better still – return the vector of posteriors for soft recognition • Interesting to try these models against simple nearest LD and WLD nearest neighbour

  8. App1: Recognising OCR Chain Codes

  9. Results (OLD!)

  10. SN-Tuple MethodCurrent Status for OCR • Actively being researched at IBM TJ Watson • See Ratzlaff , proc ICDAR 2001, pages 18 – 22 (on djvu.com (note: NOT dejavu.com!!!!!)) • Concludes: “the sn-tuple is a viable method for on-line handwriting recognition”

  11. 1. ACHROIA2. ACHROEA3. ASEMIA4. ASEMEIA5. ACHAEA6. ACODIA7. ACHORIA8. ACHYRA9. ACRAEA10.ACHIRIA App2: Contextual OCR

  12. Dictionary Extrapolation • Previous slide showed how well we can do with noisy images, with the aid of dictionary context • BUT: suppose the dictionary only has 50% coverage • Need a trainable model that can extrapolate from the given data • How to evaluate such a model?

  13. Left Out Rank Estimate • For each word in the dictionary • Create a new dictionary with that word left out • Create a set of neighbouring words to the left out word • Get model to evaluate likelihood of each neighbouring word and the left out word • Return a rank-based score between 1.0 and 0.0 (from top to bottom of list)

  14. App3: Human Chromosome Recognition (Banded Images)

  15. Example Data 22 Human Chromosomes Chromosome 10: / 1802 3 10 55 19 / A=A=a===B==a====D==d====D==e======B==b====B==b====A=a=a / 3843 84 10 55 18 / A=B===a==A==a==D==d=====D==d======C==b===A===c====A=a=a / 7231 158 10 55 20 / A===B==a==C==a==A==c===D===d======C===b==B===d===A=a==a / 787 15 10 55 18 / A==B==a=A===a===B===b===D===e====A===a==A==a=Aa=A==a==a / 2459 60 10 54 19 / A=B=aB==a=A==a==C===c==C====d=====C==b===A===c===A=a=a / 3290 21 10 54 19 / A==B==a==A==a====B==b==D====c=====B==b==A==c====A=a==a / 5591 122 10 54 17 / A=A=a==A==a====A==a====E===d====B====b==A===b==A===a=a Chromosome 15: / 1447 5 15 43 10 / AA=a======D==b=======C==d=====A==a==A==b==a / 2120 32 15 43 11 / B=a=====E===c==A=a===C==d====Aa=A=a==A=b==a / 2759 16 15 43 9 / A=A=====D====aA=a==A===c=======A=a=A===c==a

  16. N-gram Recognizers • Bigram

  17. Leave One Out Error • Generally a good estimate of test-set error • Especially fast to compute for Incremental classifiers (O(n)) • As opposed to O(n2) for non-incremental

  18. Incremental Classifiers • Can learn new patterns on demand without access to rest of training set • Can ‘forget’ or unlearn patterns on demand also • Incremental: n-gram, n-tuple, nearest neighbour (memory or counting methods) • Non-incremental: MLP, HMM, (SVM?) (latent variable re-estimation methods)

  19. Statistical Model Servers • Server model of statistical models • Each server supports a range of models • Each model can have many instances • Each instance can be invoked for training or estimation • Now we can independently evaluate the service, not just the model!

  20. Results • Bioinformatics • Dictionary modelling

  21. Statistical Language ModellingPart II • Ensembles of observable models • Latent variable models • HMM • SCFG • Category n-gram • Other applications: Robot Sensors?

More Related