Advancements in Neural Network Language Models for Large Vocabulary Continuous Speech Recognition

Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004

Introduction • Build and use neural networks to estimate LM posterior probabilities for ASR tasks • Idea: • Project word indices onto continuous space • Resulting smooth prob fns of word representations generalize better to unknown ngrams • Still an n-gram approach, but posteriors interpolated for any poss. context; no backing off • Result: significant WER reduction with small computational costs Using Neural Network LMs for LVCSR

output layer Input projection layer p1= P(wj=1| hj) oi hiddenlayer wj-n+1 ck pi= P(wj=i| hj) N M dj V hj wj-n+2 N b k H≈1k P=50 wj-1 pN= P(wj=N| hj) N = 51k N ArchitectureStandard fully connected multilayer perceptron Using Neural Network LMs for LVCSR

Architecture oi pi= P(wj=i| hj) ck M dj V b k H P d = tanh(M*c+b) pN= P(wj=N| hj) N o = tanh(V*d+k) Using Neural Network LMs for LVCSR

Training • Train with std back propagation algorithm • Error fn: cross entropy • Weight decay regularization used • Targets set to 1 for wj and to 0 otherwise • These outputs shown to cvg to posterior probs • Back-prop through projection layer NN learns best projection of words onto continuous space for prob estimation task Using Neural Network LMs for LVCSR

Optimizations

Fast Recognition Techniques • Lattice Rescoring • Shortlists • Regrouping • Block mode • CPU optimization Using Neural Network LMs for LVCSR

Fast Recognition Techniques • Lattice Rescoring • Decode with std backoff LM to build lattices • Shortlists • Regrouping • Block mode • CPU optimization Using Neural Network LMs for LVCSR

Redistributes probability mass of shortlist words Fast Recognition Techniques • Lattice Rescoring • Shortlists • NN only predicts high freq subset of vocab • Regrouping • Block mode • CPU optimization Using Neural Network LMs for LVCSR

Shortlist optimization oi pi= P(wj=i| hj) ck M dj V pS= P(wj=S| hj) k b H P N Using Neural Network LMs for LVCSR

Fast Recognition Techniques • Lattice Rescoring • Shortlists • Regrouping– Optimization of #1 • Collect and sort LM prob requests • All prob requests with same ht:only one fwd pass necessary • Block mode • CPU optimization Using Neural Network LMs for LVCSR

Fast Recognition Techniques • Lattice Rescoring • Shortlists • Regrouping • Block mode • Several examples propagated through NN at once • Takes advantage of faster matrix operations • CPU optimization Using Neural Network LMs for LVCSR

Block mode calculations oi ck M dj V b k H P d = tanh(M*c+b) N o = tanh(V*d+k) Using Neural Network LMs for LVCSR

Block mode calculations O C M D V b k D = tanh(M*C+B) O = (V*D+K) Using Neural Network LMs for LVCSR

Fast Recognition – Test Results Techniques • Lattice Rescoring – ave 511 nodes • Shortlists (2000)– 90% prediction coverage • 3.8M 4gms req’d, 3.4M processed by NN • Regrouping – only 1M fwd passes req’d • Block mode – bunch size=128 • CPU optimization Total processing < 9min (0.03xRT) Without optimizations, 10x slower Using Neural Network LMs for LVCSR

Fast Training Techniques • Parallel implementations • Full connections req low latency; very costly • Resampling techniques • Optimum floating pt operations best with continuous memory locations Using Neural Network LMs for LVCSR

Fast Training Techniques • Floating point precision – 1.5x faster • Suppress internal calcs – 1.3x faster • Bunch mode – 10+x faster • Fwd + back propagation for many examples at once • Multiprocessing – 1.5x faster 47 hours  1h27m with bunch size 128 Using Neural Network LMs for LVCSR

Application toCTS and BNLVCSR

Application to ASR • Neural net LM techniques focus on CTS bc • Far less in-domain training data  data sparsity • NN can only handle sm amount of training data • New Fisher CTS data – 20M words (vs 7M) • BN data: 500M words Using Neural Network LMs for LVCSR

Application to CTS • Baseline: Train standard backoff LMs for each domain and then interpolate • Expt #1: Interpolate CTS neural net with in-domain back-off LM • Expt #2: Interpolate CTS neural net with full data back-off LM Using Neural Network LMs for LVCSR

Application to CTS - PPL • Baseline: Train standard backoff LMs for each domain and then interpolate • In-domain PPL: 50.1Full data PPL: 47.5 • Expt #1: Interpolate CTS neural net with in-domain back-off LM • In-domain PPL: 45.5 • Expt #2: Interpolate CTS neural net with full data back-off LM • Full data PPL: 44.2 Using Neural Network LMs for LVCSR

Application to CTS - WER • Baseline: Train standard backoff LMs for each domain and then interpolate • In-domain WER: 19.9Full data WER: 19.3 • Expt #1: Interpolate CTS neural net with in-domain back-off LM • In-domain WER: 19.1 • Expt #2: Interpolate CTS neural net with full data back-off LM • Full data WER: 18.8 Using Neural Network LMs for LVCSR

Application to BN • Only subset of 500M available words could be used for training – 27M train set • Still useful: • NN LM gave 12% PPL gain over backoff on small 27M set • NN LM gave 4% PPL gain over backoff on full 500M word training set • Overall WER reduction of 0.3% absolute Using Neural Network LMs for LVCSR

Conclusion • Neural net LM provide significant improvements in PPL and WER • Optimizations can speed NN training by 20x and lattice rescoring in less than 0.05xRT • While NN LM was developed for and works best with CTS, gains found in BN task too Using Neural Network LMs for LVCSR

Advancements in Neural Network Language Models for Large Vocabulary Continuous Speech Recognition

Advancements in Neural Network Language Models for Large Vocabulary Continuous Speech Recognition

Presentation Transcript

Abduction Using Neural Models

Secrets of Neural Network Models

Neural Network

Neural Network Training Using MATLAB

Neural Network Language NNL

Measuring the Influence of Long Range Dependencies with Neural Network Language Models

Neural Network

An Introduction to Artificial Neural Network Models

Abduction Using Neural Models

Scheduling problems using Neural network

Neural Network Model s Neural Network Models in Business and Economic Modelling

Deep Neural Network Language Models

Face Detection Using Neural Network

Neural Network Models in Vision

Artificial Neural Network Models of Real Neural Computation

NEURAL NETWORK

An Approach for Sentiment Analysis using Neural Network

DNA Microarray Data Analysis using Artificial Neural Network Models.