1 / 37

Word Prediction in Hebrew Preliminary and Surprising Results

Word Prediction in Hebrew Preliminary and Surprising Results. Yael Netzer Meni Adler Michael Elhadad Department of Computer Science Ben Gurion University, Israel. Outline . Objectives and example. Methods of Word Prediction Hebrew Morphology Experiments and Results Conclusions?.

olina
Télécharger la présentation

Word Prediction in Hebrew Preliminary and Surprising Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word Prediction in HebrewPreliminary and Surprising Results Yael Netzer Meni Adler Michael Elhadad Department of Computer Science Ben Gurion University, Israel

  2. Outline • Objectives and example. • Methods of Word Prediction • Hebrew Morphology • Experiments and Results • Conclusions? Outline

  3. Word Prediction - Objectives • Ease word insertion in textual software • by guessing the next word • by giving a list of possible options for the next word • by completing a word given a prefix • General idea: guess the next word given the previous ones [Input w1 w2]  [guess w3] Objectives

  4. (Example) I s_____     Word Prediction Example

  5. (Example) I s_____      verb, adverb?     Word Prediction Example

  6. (Example) I s_____      verb     sang? maybe.   singularized? hopefully Word Prediction Example

  7. (Example) I saw a _____ Word Prediction Example

  8. (Example) I saw a _____  noun / adjective Word Prediction Example

  9. (Example) I saw a b____   Word Prediction Example

  10. (Example) I saw a b____    brown? big? bear? barometer? Word Prediction Example

  11. (Example) I saw a bird in the _____ Word Prediction Example

  12. (Example) I saw a bird in the _____  [semantics will do good] Word Prediction Example

  13. (Example) I saw a bird in the z____   Word Prediction Example

  14. (Example) I saw a bird in the z____    obvious (?) Word Prediction Example

  15. Statistical Methods • Statistical information • Unigrams:probability of isolated words • Independent of context, offer the most likely words as candidates • More complex language models (Markov Models) • Given w1..wn, determine most likely candidate for wn+1 • Most common method in applications is the unigram (see references in [Garay-Vitoria and Abascal, 2004]) Word Prediction Methods

  16. Syntactic Methods • Syntactic knowledge • Consider sequences of part of speech tags[Article] [Noun]  predict [Verb] • Phrase structure[Noun Phrase]  predict [Verb] • Syntactic knowledge can be statistical or based on hand-coded rules Word Prediction Methods

  17. Semantic Methods • Semantic knowledge • Assign semantic categories to words • Find a set of rules which constrain the possible candidates for the next word • [eat verb]  predict [word of category food] • Not widely used in word prediction, mostly because it requires complex hand coding and is too inefficient for real-time operation Word Prediction Methods

  18. Word Prediction Knowledge Sources • Corpora: texts and frequencies • Vocabularies (Can be domain specific) • Lexicons with syntactic and/or semantic knowledge • User’s history • Morphological analyzers • Unknown words models Word Prediction Methods

  19. Evaluation of Word Prediction • Keystroke savings • Time savings • Overall satisfaction • Cognitive overload (length of choice list vs. accuracy). • A predictor is considered adequate if its hit ratio is high as the required number of selections decreases. 1-(# of actual keystrokes/# of expected keystrokes) Word Prediction Evaluation

  20. Work in non-English Languages • Languages with rich morphology: • n-gram-based methods offer quite reasonable prediction [Trost et al. 2005] but can be improved with more sophisticated syntactic/semantic tools • Suggestions for inflected languages (e.g. Basque) • Use two lexicons: stems and suffixes • Add syntactic information to dictionaries and grammatical rules to the system, offer stems and suffixes • Combine these two approaches: offer inflected nouns. Hebrew Word Prediction

  21. Motivation for Hebrew • We need word prediction for Hebrew • No known previous published research for Hebrew. • We wanted to test our morphological analyzer in a useful application. Hebrew

  22. Initial Hypothesis Word prediction in Hebrew will be complicated, morphological and syntactic knowledge will be needed.

  23. Hebrew Ambiguity • Unvocalized writing: most vowels are “dropped” inherent  inhrnt • Affixation: prepositions and possessives are attached to nouns in her note inhrnt in her net  inhrnt • Rich Morphology • ‘inhrnt’ could be inflected into different forms according to sing/pl, masc/fem properties.  inhrnti, inhrntit, inhrntiot • Other morphological properties may leave ‘inherent’ unmodified (construct/absolute forms for noun compounding). Hebrew

  24. Ambiguity Level • These variations create a high level of ambiguity: • English lexicon: inherent  inherent.adj • With Hebrew word formation rules:inhrnt  in.prepher.pro.fem.possnote.noun in.prepher.pro.femnet.noun inherent.adj.masc.absolute  inherent.adj.masc.construct • Parts of speech tagset: • Hebrew: Theoretically: ~300K, In practice: ~3.6K distinct forms • English: 45-195 tags • Number of possible morphological analyses per word: • English: 1.4 (Average # words / sentence: 12) • Hebrew: 2.7 (Average # words / sentence: 18) Hebrew

  25. (Real Hebrew) Morphological Ambiguity • בצלם bzlm • בְּצֶלֶם bzelem (name of an association) • בְּצַלֵּם b-zalem (while taking a picture) • בְּצָלָם bzalam (their onion) • בְּצִלָּם b-zila-m (under their shades) • בְּצַלָּם b-zalam (in a photographer) • בַּצַּלָּם ba-zalam (in the photographer( • בְּצֶלֶם b-zelem (in an idol( • בַּצֶּלֶם ba-zelem (in the idol( Hebrew Morphology

  26. Morphological Analysis Given a written form, recover the following information: • Lexical category (part-of-speech) • noun, verb adjective, adverb, preposition… • Inflectional properties • gender, number, person, tense, status… • Affixes • Prefixes: מ ש ה ו כ ל ב (prepositions, conjunctions, definiteness) • Pronoun suffix: accusative, possessive, nominative Hebrew Morphology

  27. Morphological Analysis Example: given the form בצלם propose the following analyses: • בְּצֶלֶם • בצלם proper-noun • בְּצַלֵּם • בצלם verb, infinitive • בְּצָלָם • בצל-ם noun, singular, masculine • בְּצִלָּם • ב-צל-ם noun, singular, masculine • בְּצַלָּםבְּצֶלֶם • ב-צלם noun, singular, masculine, absolute • ב-צלם noun, singular, masculine, construct • בַּצַּלָּםבַּצֶּלֶם • ב-צלם noun, definitive singular, masculine Hebrew Morphology

  28. Morphological Disambiguation A difficult task in Hebrew: Given a written form, select in context the correct morphological analysis out of all possible analyses. We have developed a successful* system to perform morphological disambiguation in Hebrew [Adler et al, ACL06, ACL07, ACL08]. *93% for POS tagging and 90% for full morphology analysis, which was used in this test) Hebrew Morphology

  29. Word Prediction in Hebrew • We looked at Word Prediction as a sample task to show off the quality of our Morphological Disambiguator • But first… we checked a simple baseline Hebrew Word Prediction

  30. Baseline: n-gram methods • Check n-gram methods (unigram, bigram, trigram) • Four sizes of selection menus: 1, 5, 7 and 9 • Various training sets of 1M, 10M and 27M words to learn the probabilities of n-grams. • Various genres. Hebrew Word Prediction

  31. Prediction results using n-grams only Keystrokes needed to enter a message in % (Smaller is better) For tri-grams model trained on 27M corpus – very good results! Hebrew Word Prediction

  32. Adding Syntactic Information P(wn|w1,…,wn-1) = λ1P(wn-i,…,wn|LM) + λ2P(w1,…,wn|μ), • μ is the morpho-syntactic HMM (morphological disambiguator) • Combine P(w1,…,wn|μ) with the probabilistic language model LM in order to rank each word candidate given previous typed words. • if the user typed I saw, and the next word candidates are {him, hammer} we use the HMM model, for calculating: p(I saw him|μ) p(I saw hammer|μ), in order to tune the probability given by the n-gram. * Trained on a 1M sized corpus. Hebrew Word Prediction

  33. Results with morpho-syntactic knowledge Model sequences of parts of speech with morphological features Results w/o syntactic knowledge Hebrew Word Prediction

  34. Some Notes on Results • n-grams perform very well (high level of keystroke saving) • High rate for all genres • And the expected: • Better prediction when trained on more data • Better prediction with tri-grams • Better prediction with larger window • Morpho-syntactic information did not improve results (in fact, it hurt!) Results

  35. Conclusion • Statistical data on a language with rich morphology yields good results • up to 29% with nine word proposals • 34% for seven proposals • 54% for a single proposal • Syntactic information did not improve the prediction. • Explanation - morphology didn't improve due the use of p(w1,…,wn|μ) of an unfinished sentence Hebrew Word Prediction - Conclusions

  36. תודה Thank you

  37. Technical Information • CMU – N-grams • Storage – Berkeley DB to store knowledge for WP: Mapping n-grams • More questions on technology – meni.adler@gmail.com Hebrew Word Prediction

More Related