150 likes | 262 Vues
This study explores methods to measure and normalize inter-speaker variability in speech signals, aiming to improve recognition accuracy by adapting feature vectors or model parameters. Techniques like vocal tract normalization and cepstral mean and variance normalization are discussed in detail, along with Fisher discriminant analysis to assess feature sets. The study emphasizes the importance of a good feature vector space for accurate classification.
E N D
Investigation on Inter-Speaker Variability in The Feature Space Presenter : 陳彥達
Reference • R. Haeb-Umbach, “Investigation on Inter-Speaker Variability in The Feature Space”, ICASSP 99.
Outline • Introduction • A measure of inter-speaker variability • Vocal tract normalization • Cepstral mean and variance normalization
Introduction • Adaptation • Reduce mismatch by adapting feature vectors or model parameters to the target environment.
Introduction(2) • Normalization • Compute feature or model parameters that are insensitive to undesired variations of the speech signal.
Introduction(3) • Fisher discriminant analysis • An early assessment of a feature set without running recognition first • The ratio of feature variability due to different phonemes and due to different speakers
A measure of inter-speaker variability • Good feature vector space • Close together when belonging to the same phoneme class • Separated from each other when belonging to the different phoneme class
A measure of inter-speaker variability(2) : cepstral feature vectors : cepstral mean feature vector : class mean vector : total mean vector
A measure of inter-speaker variability(3) : cepstral mean feature vector : class mean vector : total mean vector : between class covariance matrix : within class covariance matrix
A measure of inter-speaker variability(4) • Fisher variate analysis • = the sum of the eigenvalues of • The radius of the scattering volume • Higher lower recognition error rate
Vocal tract normalization • Reduce inter-speaker variability by a speaker-specific frequency warping • Differences in vocal tract length are compensated for by a linear warping factor
Vocal tract normalization(2) 42 male + 42 female 42 male
Vocal tract normalization(3) a normalization on a per sentence basis performs better than a normalization on a per speaker basis
Cepstral mean and variance normalization : input cepstral feature : estimate of the mean of the input cepstral feature : estimate of the standard deviation of the input cepstral feature : the mean and variance normalized feature : number of features
Cepstral mean and variance normalization(2) 42 male + 42 female 42 male