Structure-Based Speech Classification Using Nonlinear Embedding Techniques
E N D
Presentation Transcript
Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu Advisor Dr. Robert E. Yantorno Committee Dr. Saroj K. Biswas Dr. Henry M. Sendaula
Acknowledgment • Dr. Robert Yantorno • Dr. Saroj Biswas • Dr. Henry Sendaula • Speech Lab Members • Air Force Research Laboratory, Rome, NY
Overview • Voiced and Unvoiced Speech • Usable and Unusable Speech • Nonlinearities in Speech • Non-Linear Embedding • Research Goal • Proposed Research
Voiced Quasi-periodic excitation Modulation by vocal tract Production of vowels, voiced fricatives & plosives Voiced/Unvoiced Characteristics • Unvoiced • No periodic vibration of vocal chords • Noise-like nature • Production of unvoiced fricatives and plosives
Usable Speech • Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition. • Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments • Target-to-interferer Ratio (TIR) > 20dB
Nonlinearities in Speech • Glottal waveform changes • Shape varies with amplitude • Physical observations • Flow in vocal tract is non-laminar • Coupling between vocal tract and folds • When glottis is open, prominent changes are observed in formant characteristics
Nonlinear Embedding • Nonlinear Systems • Point moving along some trajectory in an abstract state space • Coordinates of the point are independent degrees of freedom of the system • State space could be reconstructed from a scalar signal
Nonlinear Embedding (cont’d) • Takens’ Method of Delays • A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension • Vectors in m-dimensional state space are formed from time-delayed values of a signal
Nonlinear Embedding (cont’d) • m = embedding dimension • d = delay value
Nonlinear Embedding (Cont’d) • Delay value, d: • Dependent on sampling rate and signal properties • Large enough such that nonlinearities are taken into account by the reconstructed trajectory • Small enough to retain reasonable time resolution
Nonlinear Embedding (Cont’d) • Dimension, m: • Generation of voiced speech constitutes a low-dimensional system • Generation of unvoiced speech constitutes a relatively high-dimensional system • Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech
Research Goal • Feature Extraction • Difference-Mean Comparison (DMC) Measure • Voiced/unvoiced classification • Nodal Density Measure • Voiced/unvoiced classification • Usable/unusable classification
Difference-Mean Comparison (DMC) Measure Voiced/Unvoiced Classification
Introduction • 3rd order difference computation along first non-singleton dimension • Ist order difference of NxN matrix given by • Length(3rd order diff. > mean) observed
Nodal Density Measure Voiced/Unvoiced Classification Usable/Unusable Classification
Introduction • Smallest cube which encloses the signal is determined • This cube is divided into N smaller cubes • Edges of the smaller cubes are defined as nodes • Number of nodes spanned by the signal is determined • Ratio of number of nodes spanned to total number of nodes is defined as nodal density
Filtering • Moving Average Filter • Order, M = 10
Proposed Research Usable/Unusable Classification
Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR Nodes Spanned by Embedded Co-channel Speech of 30dB TIR 6000 6000 6000 4000 4000 4000 2000 2000 2000 0 0 0 -2000 -2000 -4000 -4000 -2000 -6000 -6000 -4000 5000 5000 5000 5000 5000 0 0 6000 0 0 4000 -5000 -5000 0 2000 -5000 -5000 0 -10000 -10000 -10000 -10000 -2000 -5000 -4000 Nodes Spanned by Embedded Usable and Unusable Speech Frames
Difference-Mean Comparison V/UV Classification Nonlinear Embedding Speech Nodal Density V/UV Classification Usable/Unusable Classification Summary
Future Proposed Research • Determine optimum filter for nodal density-based voiced/unvoiced classification • Develop nodal density measure for usable/unusable classification • Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification • Perform decision-level fusion of both features