200 likes | 323 Vues
This paper introduces the Inside-Outside algorithm for estimating Stochastic Context-Free Grammars (SCFGs), focusing on its applications in speech recognition. It compares SCFGs with regular grammars, highlighting SCFGs' advantages such as capturing embedded structures in speech data. The paper also discusses a pre-training approach to reduce re-estimation cycles, grammar minimization techniques to eliminate redundancy, and the algorithm's implementation for better efficiency. Conclusions draw on the efficacy and potential of SCFGs in natural language processing.
E N D
The estimation of stochastic context-free grammars using the Inside-Outside algorithm 1998. 10. 16. Oh-Woog Kwon KLE Lab. CSE POSTECH
Contents • Introduction • The Inside-Outside algorithm • Regular versus context-free grammar • Pre-training • The use of grammar minimization • Implementation • Conclusions
Introduction - 1 • HMM => SCFG in speech recognition tasks • The advantages of SCFG’s • Ability to capture embedded structure within speech data • useful at lower levels such as phonological rule system • Learning: a simple extension of the Baum-Welch re-estimation procedure (Inside-Outside algorithm) • Little previous works of SCFG’s in speech • Two factors for the limited interest in speech • The increased power of CFG’s: not useful for natural language • If all of the sentences = finite, CFG = RG • The time complexity of the Inside-Outside algorithm • O(n3) : n = input string length + the # of grammar symbols
Introduction - 2 • Usefulness of CFG’s in NL • The ability to model derivation probabilities > the ability to determine language membership • So, this paper • introduces the Inside-Outside algorithm • compares CFG with RG using the entropy of the language generated by each grammar • Reduction of time complexity of In-Outside algorithm • This paper • describes a novel pre-training algorithm (smaller iteration) • minimizes the number of non-terminal with grammar minimization (GM) : smaller symbols • implements the In-Outside algorithm using a parallel transputer array : smaller input string length
The Inside-Outside algorithm - 1 • Chomsky Normal Form (CNF) in SCFG • Generated observation sequence: O = O1, O2, …, OT • The matrices of parameters • An application of SCFS’s • recognition : • training :
The Inside-Outside algorithm - 2 • Definition of inner (e) and outer (f) probabilities i S Inner probability S i Outer probability i 1 s-1 s t t+1 T
The Inside-Outside algorithm - 3 • Inner probability: be computed bottom-up • Case 1: (s=t) the form i m • Case 2: (st) the form i jk i j k s r r+1 t
The Inside-Outside algorithm - 4 • Outer Probability: be computed top-down j j + k i i k 1 r s-1 s t T 1 s t t+1 r T
The Inside-Outside algorithm - 5 • Recognition Process • By setting s=1, t=T, • By setting s=t,
The Inside-Outside algorithm - 6 • Training Process
The Inside-Outside algorithm - 7 • Re-estimation formula for a[i,j,k] and b[i,m]
The Inside-Outside algorithm - 8 • The Inside-Outside algorithm 1. Choose suitable initial values for the A and B matrices 2. Repeat A = … {Equation 20} B = … {Equation 21} P = … {Equation 11} UNTIL change in P is less than a set threshold
Regular versus context-free grammar • Measurements for the comparison • The entropy making an -representation of L • Empirical entropy • Language for the comparison: palindromes • The number of parameters for each grammar • SCFG: N(# of non-terminal), M(# of terminal) => N3+NM • HMM(RG): K(# of states), M(# of terminal) => K2+(M+2)K • Condition for the comparison : N3+NM K2+(M+2)K • The result (the ability to model derivation probabilities) • SCFG > RG
Pre-training - 1 • Goal: start off with good initial estimates • reducing the number of re-estimation cycles required (40%) • facilitating the generation of a good final model • Pre-training 1. Use Baum-Welch algorithm (O(n2)) to obtain a set of RG rules 2. RG rules (final matrices) => SCFG rules (initial matrices) 3. Start off the Inside-Outside algorithm (O(n3)) with the initial matrices • Time complexity: a n2 + b n3 << c n3 , if b << c
Pre-training - 2 • Modification (RG => SCFG) (a) For each bjk, define Yjk with probability bjk. (b) For each aij, define Xi Ya Xj with probability aij. (c) For each Si, define S Xi with probability Si. • If Xi Ya Xl with ail, S Ya Xl with Siail (d) For each Fj, define Xj Ya with probability Fj. • If Yak with bak, Xjk with bak Fj • The remaining zero parameters => RG • all parameters += floor value; (floor value = 1/ # of non-zero parameters) • re-normalization for
The use of grammar minimization - 1 • Goal: detect and eliminate redundant and/or useless symbols • Good grammar: self-embedding • CFG = self-embedding, if a A such that A *wAx and neither w nor x is . • Require more non-terminal symbols • Smaller n: speed up the Inner-Outer algorithm • Constraining the In-Outside algo. • Greedy symbols: take too many non-terminals • Constrains • allocate a non-terminal to each terminal symbol • force the remaining non-terminals to model hidden branching process • Infeasible for practical approaches (i.e. speech): because of inherent ambiguity
The use of grammar minimization - 2 • Two ways for GM incorporated into the In-Outside algo. • First approach: computationally intractable • In-Out algo.: start with fixed maximum symbols • GM: periodically detect and eliminate redundant and useless symbols • Second approach: more practical • In-Out algo.: start with the desired number of non-terminals • GM: periodically(or log P(S) < threshold) detect and reallocate redundant symbols
The use of grammar minimization - 3 • GM algorithm (ad hoc) 1. Detect greedy symbols in bottom-up fashion 1.1 redundant non-terminals are replaced by a single non-terminal 1.2 free the redundant non-terminals (free non-terminals) 1.3 the same rules are collapsed into a single rule by adding their probabilities 2. Fix the parameters of the remaining non-terminals involved in the generation of greedy symbols (excluded from (3) and (4)) 3. For each free non-terminal i, 3.1 b[i,m]= zero, if m is a greedy symbol, randomize b[i,m], otherwise. 3.2 a[i,j,k] = zero, if j and k are the non-terminals of step 2, randomize a[i,j,k], otherwise. 4. Randomize a[i,j,k] : i(the non-terminals of step2), j and k(free non-terminals)
Implementation using transputer array • Goal: • Speed up the In-Outside algo. (100 times faster) • Split the training data into several subsets • The in-Outside algo. works independently on each subset • Implementation Computes the update parameter set and transmits it down the chain to all the others. SUN Control board Transputer 1 Transputer 2 Transputer 64 ... Each tranputer works independently on its own data set.
Conclusions • Usefulness of CFG’s in NL • This paper • introduced the Inside-Outside algorithm in speech recognition • compares CFG with RG using the entropy of the language generated by each grammar in “toy” problem • Reduction of time complexity of In-Outside algorithm • This paper • described a novel pre-training algorithm (smaller iteration) • proposed an ad hoc grammar minimization (GM) : smaller symbols • implemented the In-Outside algorithm using a parallel transputer array : smaller input string length • Further Research • build SCFG models trained from real speech data