1 / 185

Search and Decoding in Speech Recognition

Search and Decoding in Speech Recognition. Words and Transducers. Outline. Outline. Outline. Introduction. Introduction. From Ch 1. – regular expressions; we saw how easy it is to search for a plural of the woodchuck ( woodchucks ) .

ninon
Télécharger la présentation

Search and Decoding in Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search and Decoding in Speech Recognition Words and Transducers

  2. Outline Veton Këpuska

  3. Outline Veton Këpuska

  4. Outline Veton Këpuska

  5. Introduction Veton Këpuska

  6. Introduction • From Ch 1. – regular expressions; we saw how easy it is to search for a plural of the woodchuck (woodchucks). • However searching for plural of fox, fish, peccary or wild goose, etc. is not as trivial as just tacking on an s. • Main Entry: foxPronunciation: 'fäksFunction: nounInflected Form(s): pluralfox·esalsofoxUsage: often attributiveEtymology: Middle English, from Old English; akin to Old High German fuhs fox and perhaps to Sanskrit puccha tail • Main Entry: fishPronunciation: 'fishFunction: nounInflected Form(s): pluralfishorfish·esUsage: often attributiveEtymology: Middle English, from Old English fisc; akin to Old High German fisc fish, Latin piscis • Main Entry: pec·ca·ryPronunciation: 'pe-k&-rEFunction: nounInflected Form(s): plural-riesEtymology: of Cariban origin; akin to Suriname Carib paki:ra peccary: any of several largely nocturnal gregarious American mammals resembling the related pigs: as a: a grizzled animal (Tayassutajacu) with an indistinct white collar b: a blackish animal (Tayassupecari) with a whitish mouth region • Main Entry: goosePronunciation: 'güsFunction: nounInflected Form(s): pluralgeese /'gEs/Etymology: Middle English gos, from Old English gOs; akin to Old High German gans goose, Latin anser, Greek chEn Veton Këpuska

  7. Introduction • Required knowledge to correctly search for singulars and plurals in English language: • Orthographic rules: Words ending in –y are pluralized by changing the –y to –i and adding an –es. • Morphological rules: tell us that fish has null plural and that the plural of goose is formed by changing the vowel. • Morphological parsing: recognizing that a word (like foxes) break down into component morphemes (fox and -es) and building a structured representation of it. • Parsingmeans taking an input and producing some sort of linguistic structure for it. • Parsing can be thought in broad terms producing structures based on: Producing Veton Këpuska

  8. Introduction • Morphological parsing (or stemming) applies to many affixes other than plurals; • Example: Parsing any English verbs ending in –ing (e.g., going, talking, congratulating) into its verbal stem plus the –ing morpheme. • going ⇨ VERB-go + GERUND-ing • Morphological parsing is important for speech and language processing: • Part-of-speech tagging • Dictionaries (spell-checking) • Machine translation Veton Këpuska

  9. Introduction • To solve morphological parsing problem one could just store all the plural forms of English nouns and –ing forms of English verbs in dictionary as, for example, in English Speech Recognition tasks. • For many Natural Language Processing applications this is not possible because –ing is a productive suffix: that is, it applies to every verb and it requires knowing the rules to adding this suffix. • Similarly –s applies to almost every noun. • Productive suffixes apply to new words: • Example: fax and faxing • New words (e.g., acronyms and proper nouns) are created constantly – need to add the plural morpheme –s to each. • Plural form of new nouns depends on the spelling/pronunciation of the singular form (eg. The nouns ending in –z the plural is formed by replacing it with –es). • In other languages (e.g., Turkish) one cannot list all the morphological variants of every word: • Turkish verbs have 40,000 possible forms not counting derivational suffixes. Veton Këpuska

  10. Noun • Most of us learned the classic definition of  noun back in elementary school, where we were told simply that - “a noun is the name of a person, place, or thing.”  • That's not a bad beginning; it even clues us in to the origin of the word, since noun is derived ultimately from the Latin word nōmen, which means ‘name’.  Veton Këpuska

  11. noun • any member of a class of words that can function as the main or only elements of subjects of verbs (A dog just barked), or of objects of verbs or prepositions (to send money from home), and that in English can take plural forms and possessive endings (Three of his buddies want to borrow John's laptop). Nouns are often described as referring to persons, places, things, states, or qualities, and the word noun is itself often used as an attributive modifier, as in noun compound; noun group. Veton Këpuska

  12. Verb • The key word in most sentences, the word that reveals what is happening, is the verb. It can declare something: • You ran, • ask a question • Did you run?, • convey a command • Run faster!, or • express a wish • May this good weather last!, or • a possibility • If you had run well, you might have won; • if you run better tomorrow, you may win. Veton Këpuska

  13. Verb • You cannot have a complete English sentence without at least one verb. Verb • any member of a class of words that function as the main elements of predicates, that typically express action, state, or a relation between two things, and that may be inflected for tense, aspect, voice, mood, and to show agreement with their subject or object. Veton Këpuska

  14. The definitions of noun and verb were taken from dictionary.com Veton Këpuska

  15. Outline Veton Këpuska

  16. Outline • Survey of morphological knowledge for English • Introduction of finite-statetransduceras the key algorithm for morphological parsing. • Finite-state transducers are key algorithms for speech and language processing. • Related algorithms: • Stemming: mapping from the word to its root or stem. Important to Information Retrieval tasks. • Need to know if two words have a similar root despite their surface differences • Example: sang and sung. The word sing is called the common lemma of these words, and mapping form all these to sing is called lemmatization. Veton Këpuska

  17. Outline • Tokenization or Word Segmentation – a related algorithms to morphological parsing that is defined as a task of separating out (tokenizing) words from running text. • English language text separates words by white space but: • “New York”, “rock ‘n’ roll” – are considered single words • I’m – is considered two words “I” and “am” • … etc. • For many applications we need to know how similar two words are orthographically. • Morphological parsing is one method for computing similarity, • Comparison of strings of letters via minimum edit distance algorithm. Veton Këpuska

  18. Morphological Parsing • Morphological parsing, in natural language processing, is the process of determining the morphemes from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules. For example, the word 'foxes' can be decomposed into 'fox' (the stem), and 'es' (a suffix indicating plurality). • The generally accepted approach to morphological parsing is through the use of a finite state transducer (FST), which inputs words and outputs their stem and modifiers. The FST is initially created through algorithmic parsing of some word source, such as a dictionary, complete with modifier markups. Veton Këpuska

  19. Outline Veton Këpuska

  20. Survey of English Morphology Veton Këpuska

  21. Survey of English Morphology • Morphology is the study of the way words are built up from smaller meaning-bearing units - morphemes. • Morpheme is often defined as the minimal meaning-bearing unit in a language. • Main Entry: mor·phemePronunciation: 'mor-"fEmFunction: nounEtymology: French morphème, from Greek morphE form: a distinctive collocation of phonemes (as the free form pin or the bound form -s of pins) having no smaller meaningful parts Veton Këpuska

  22. Survey of English Morphology • Example: • fox consists of a single morpheme: fox. • cats consists of two morphemes: cat and –s. • Two broad classes of morphemes: • Stems - main morpheme of a word, and • Affixes – add additional meaning to the word. • Prefixes – preceding the stem: unbuckle • Suffixes – following the stem: eats • Infixes – inserted in the stem: humingi(Philippine language Tagalog – in English “more or less”) • Circumfixes – precede and follow the stem. gesagt (German past participle of sagen) Veton Këpuska

  23. Survey of English Morphology • A word can have more than one affix: • rewrites: • Prefix - re • Stem - write • Suffix - s • unbelievably: • Prefix - un • Stem - believe • Suffix - able, ly • English language does not tend to stack more than four or five affixes • Turkish can have words with nine or ten affixes – languages like Turkish are called agglutinative languages. Veton Këpuska

  24. ag·glu·ti·na·tive Pronunciation: \ə-ˈglü-tən-ˌā-tiv, -ə-tiv\ Function: adjective Date: 1634 1:adhesive2: characterized by linguistic agglutination Veton Këpuska

  25. Survey of English Morphology • There are many ways to combine morphemes to create a word. Four methods are common and play important role in speech and language processing: • Inflection • Combination of a word stem with a grammatical morpheme, usually resulting in a word of the same class as the original stem, and usually filling some syntactic function like agreement. • Example: • -s: plural of nouns • -ed: past tense of verbs. Veton Këpuska

  26. Survey of English Morphology • Derivation • Combination of word stem with a grammatical morpheme, usually resulting in a word of a different class, often with a meaning hard to predict. • Example: • Computerize – verb • Computerization – noun. • Compounding • Combination of multiple word stems together. • Example: • Doghouse: dog + house. • Cliticization • Combination of a word stem with a clitic. A clitic is a morpheme that acts syntactically like a word, but is reduced in form and attached (phonologically and sometimes orthographically) to another word. • Example: • I’ve = I + ‘ve = I + have Veton Këpuska

  27. Outline Veton Këpuska

  28. Inflectional Morphology Veton Këpuska

  29. Inflectional Morphology • English language has a relatively simple inflectional system; Only • Nouns • Verbs • Adjectives (sometimes) • Number of possible inflectional affixes is quite small. Veton Këpuska

  30. Inflectional Morphology: Nouns • Nouns (English): • Plural • Possessive • Many (but not all) nouns can either appear in • bare stem or singular form, or • Take a plural suffix Veton Këpuska

  31. Inflectional Morphology: Nouns • Regular plural spelled: • -s • -es after words ending in • –s (ibis/ibises) • -z (waltz/waltzes) • -sh (thrush/thrushes) • -ch (finch/finches) • -x (box/boxes); sometimes • Nouns ending in –y preceded by a consonant change the –y to –i (butterfly/butterflies). • The possessive suffix is realized by apostrophe + -s for • Regular singular nouns (llama’s), and • Plural nouns not ending in –s (children’s), and often • Lone apostrophe after • Regular plural nouns (llamas’), and some • Names ending in –s or –z (Euripides’ comedies’). Veton Këpuska

  32. Inflectional Morphology: Verbs English language inflection of verbs is more complicated than nominal inflection, e.g. regular & irregular verbs English has three kinds of verbs • Main verbs (eat, sleep, impeach) • Modal verbs (can, will, should) • Primary verbs (be, have, do) • Concerned with main and primary verbs because these have inflectional endings. • Of these verbs a large class are regular (all verbs in this class have the same endings marking the same functions) Veton Këpuska

  33. Inflectional Morphology Regular & Irregular Verbs Veton Këpuska

  34. Regular Verbs • Regular Verbs have four morphological forms. • For regular verbs we know the other forms by adding one of three predictable endings and making (some) regular spelling changes. Veton Këpuska

  35. Regular Verbs • Since regular verbs • Cover majority of the verbs and forms, and • Regular class is productive, they are significant in the morphology of English language. Productive class is one that automatically includes any new words that enter the language. Veton Këpuska

  36. Irregular Verbs • Irregular Verbs are those that have some more or less idiosyncratic forms of inflection. • English irregular verbs • often have five different forms, but can have • as many as eight (e.g., the verb be), or • as few as three (e.g., cut or hit) • They constitute a smaller class of verbs estimated to be about 250 Veton Këpuska

  37. Usage of Morphological Forms for Irregular Verbs • The –s form: • Used in “habitual present” form to distinguish the third-person singular ending: “She jogs every Tuesday” from the other choices of person and number “I/you/we/they jog every Tuesday”. • The stem form: • Used in in the infinitive form, and also after certain other verbs “I’d rather walk home, I want to walk home” • The –ing participle is used in the progressive construction to mark a present or ongoing activity “It is raining”, or when the verb is treated as a noun (this particular kind of nominal use of a verb is called gerund use: “Fishing is fine if you live near water”) • The –ed participle is used in the perfect construction “He’s eaten lunch already”, or passive construction “The verdict was overturned yesterday” Veton Këpuska

  38. Spelling Changes • A number of regular spelling changes occur at morpheme boundaries. • Example: • A single consonant letter is doubled before adding the –ing and –ed suffixes: beg/begging/begged • If the final letter is “c”, the doubling is spelled “ck”: picnic/picnicking/picnicked • If the base ends in a silent –e, it is deleted before adding –ing and –ed: merge/merging/merged • Just as for nouns, the –s ending is spelled • –es after verb stems ending in –s (toss/tosses) • -z (waltz/waltzes) • -sh (wash/washes) • -ch (catch/catches) • -x (tax/taxes) sometimes. • Also like nouns, verbs ending in –y preceded by a consonant change the –y to –i (try/tries). Veton Këpuska

  39. Outline Veton Këpuska

  40. Derivational Morphology Veton Këpuska

  41. Derivational Morphology • Derivation is combination of a word stem with a grammatical morpheme • Usually resulting in a word of a different class, • Often with a meaning hard to predict exactly • English inflection is relatively simple compared to other languages. • Derivation in English language is quite complex. Veton Këpuska

  42. Derivational Morphology • A common kind of derivation in English is the formation of • new nouns, • From verbs, or • Adjectives, called nominalization. • Example: • Suffix –ation produces nouns from verbs ending often in the suffix –ize (computerize → computerization) Veton Këpuska

  43. Derivational Morphology • Adjectives can also be derived from nouns and verbs Veton Këpuska

  44. Complexity of Derivation in English Language • There a number of reasons for complexity in Derivation in English: • Generally it is less productive: • Nominalizing suffix like –ation, which can be added to almost any verb ending in –ize, cannot be added to absolutely every verb. • Example: we can’t say *eatation or *spellation (* marks stem of words that do not have the named suffix in English) • There are subtle and complex meaning differences among nominalizing suffixes • Example: sincerity vs sincereness Veton Këpuska

  45. Outline Veton Këpuska

  46. Cliticization Veton Këpuska

  47. Cliticization • clitic noun  (linguistics) a morpheme that functions like a word, but appears not as an independent word but rather is always attached to a following or preceding word. In English, the possessive ('s), -'s is an example. • cliticization noun process or instance of a word becoming a clitic Veton Këpuska

  48. Cliticization • Clitic is a unit whose status lies in between that of an affix and a word. • Phonological behavior: • Short • Unaccented • Syntactic behaviour: • Words, acting as: • Pronouns, • Articles, • Conjunctions • Verbs Veton Këpuska

  49. Cliticization • Proclitics – clitics proceeding a word • Enclitics – clitics following a word • Ambiguity • She’s→ she is or she has Veton Këpuska

  50. Outline Veton Këpuska

More Related