560 likes | 918 Vues
Large Vocabulary Continuous Speech Recognition. Subword Speech Units. HMM-Based Subword Speech Units. Training of Subword Units. Training of Subword Units. Training Procedure. Errors and performance evaluation in PLU recognition. Substitution error (s) Deletion error (d)
 
                
                E N D
Errors and performance evaluation in PLU recognition • Substitution error (s) • Deletion error (d) • Insertion error (i) • Performance evaluation: • If the total number of PLUs is N, we define: • Correctness rate: N – s – d /N • Accuracy rate: N – s – d – i / N
Language Models for LVCSR Word Pair Model: Specify which word pairs are valid
Perplexity of the Language Model Entropy of the Source: First order entropy of the source: If the source is ergodic, meaning its statistical properties can be completely characterized in a sufficiently long sequence that the Source puts out,
We often compute H based on a finite but sufficiently large Q: H is the degree of difficulty that the recognizer encounters, on average, When it is to determine a word from the same source. Using language model, if the N-gram language model PN(W) is used, An estimate of H is: In general: Perplexity is defined as:
Naval Resource (Battleship) Management Task: 991-word vocabulary NG (no grammar): perplexity = 991
Word pair grammar We can partition the vocabulary into four nonoverlapping sets of words: The overall FSN allows recognition of sentences of the form:
WP (word pair) grammar: Perplexity=60 FSN based on Partitioning Scheme: 995 real arcs and 18 null arcs WB (word bigram) Grammar: Perplexity =20
Control of word insertion/word deletion rate • In the discussed structure, there is no control on the sentence length • We introduce a word insertion penalty into the Viterbi decoding • For this, a fixed negative quantity is added to the likelihood score at the end of each word arc
Context-dependent subword units Creation of context-dependent diphones and triphones
If c(.) is the occurrence count for a given unit, we can use a unit reduction rule such as: CD units using only intraword units for “show all ships”: CD units using both intraword and itnerword units:
Word junction effects To handle known phonological changes, a set of phonological rules are Superimposed on both the training and recognition networks. Some typical phonological rules include:
A key source of difficulty in continuous speech recognition is the So-called function words, which include words like a, and, for, in, is. The function words have the following properties:
Semantic Postprocessor For Recognition