570 likes | 657 Vues
This study explores how speakers' formant dynamics can reflect individual differences in speech production, focusing on the sequence /aɪk/ in Australian English. By capturing dynamic information through polynomial equations and linear regression, the research aims to characterize speakers based on formant frequency dynamics. Using discriminant analysis, the study evaluates the effectiveness of these methods in distinguishing between speakers. The findings show promising results in clustering tokens and classifying speakers. However, the study proposes a new approach utilizing linear regression to efficiently parameterize dynamic formant contours, capturing the most defining aspects of individual speaker characteristics.
E N D
Characterisation of individuals’ formant dynamics using polynomial equations IAFPA 2006 Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge kem37@cam.ac.uk
Speaker characteristics and static features of speech • Most previous research has focussed on static features- instantaneous, average • Straightforward to measure • Natural progression from other research areas – delineation of different languages and language varieties
Speaker characteristics and static features of speech • Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT • Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population look todynamic (time-varying) features
Dynamic features of speech • More information than static • Reflect movement of a person’s speech organs as well as dimensions- people move in individual ways for skilled motor activities - walking, running, … and speech
Dynamic features of speech • can view speech as achievement of a series of linguistic ‘targets’ • speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways examine formant frequency dynamics
Formant dynamics Frequency (Hz) Time (s) Time (s) /aɪ/ in ‘bike’ uttered by two male speakers of Australian English
Formant dynamics Frequency (Hz) Time (s) Time (s) 10% 10% /aɪ/ in ‘bike’ uttered by two male speakers of Australian English
Formant dynamics Frequency (Hz) Time (s) Time (s) /aɪ/ in ‘bike’ uttered by two male speakers of Australian English
Research Questions • How do speakers’ formant dynamics reflect individual differences in the production of the sequence //? • How can this dynamic information be captured to characterise individual speakers?
bike hike like mike spike /baIk/ /haIk/ /laIk/ /maIk/ /spaIk/ /aIk/ Target words:
Data set e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now. 5 repetitions x 5 words (bike, hike, like, mike, spike) x 2 stress levels (nuclear, non-nuclear) x 2 speaking rates (normal, fast) = 100 tokens per subject
Subjects • 5 adult male native speakers of Australian English (A, B, C, D, E) • aged 22-28 • Brisbane/Gold Coast, Queensland
Speaker A “bike” (normal-nuclear) 10 20 30 40 50 60 70 80 90% 1 2
Speaker A “bike” (normal-nuclear) F3 F2 F1 • F3 • F2 • F1 10 20 30 40 50 60 70 80 90% 1 2
F1 normal-nuclear Frequency (Hz) +10% step of /a/
F2 normal-nuclear Frequency (Hz) +10% step of /a/
F3 normal-nuclear Frequency (Hz) +10% step of /a/
Discriminant Analysis Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership (ref. Tabachnick and Fidell 1996)
Discriminant Analysis Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1
Discriminant Analysis Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour e.g. Speaker E’s 25 tokens of /aɪk/ fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1
Discriminant Analysis DA constructs discriminant functions which maximise differences between speakers(each function is a linear combination of the formant frequency predictors) fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1
Discriminant Analysis Assess how well the predictors distinguish speakers by extent of clustering of tokens+ classification percentage… fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1
Discriminant Analysis Assess how well the predictors distinguish speakers by extent of clustering of tokens+ classification percentage…95% fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1
Discriminant Analysis 95% 89% 95% 88%
Discussion • DA scatterplots and classification rates promising • However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information • Recall: individuals’ F1 contours of /aɪk/…
F1 normal-nuclear Frequency (Hz) +10% step of /a/
A new approach… • Differences in location in frequency range • Differences in curvature – location of turning points, convex/concave, steep/shallow • Need to capture most defining aspects of the contours efficiently linear regression to parameterise curves with polynomial equations
Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x
Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x
Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x
Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x x
Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x y-intercept x
Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x y-intercept gradient x
Linear regression • Can also be used for curvilinear relationships y x
Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 x
Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 y-intercept x
Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 y-intercept determine shape and direction of curve x
Polynomial Equations y Cubic y = a0 + a1x + a2x2 + a3x3 Quartic y = a0 + a1x + a2x2 + a3x3 + a4x4 Quintic y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5 x y x y x
Polynomial Equations y Cubic y = a0 + a1x + a2x2 + a3x3 Quartic y = a0 + a1x + a2x2 + a3x3 + a4x4 Quintic y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5 x y x y x
/ak/data • fit F1, F2, F3 contours with polynomial equations • test the reliability of the polynomial coefficients in distinguishing speakers Quadratic: y = a0 + a1t + a2t2 Cubic: y = a0 + a1t + a2t2 + a3t3
“bike”, Speaker A (normal-nuclear token 1) F1 contour actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t2 Cubic fit: y = 478.85 - 46.07t + 35.62t2 - 3.46t3 y Frequency (Hz) t Normalised time
“bike”, Speaker A (normal-nuclear token 1) F1 contour actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t2 R = 0.879 Cubic fit: y = 478.85 - 46.07t + 35.62t2 - 3.46t3 R = 0.978 y Frequency (Hz) t Normalised time
“bike”, Speaker A (normal-nuclear token 1) F2 contour y actual data points Quadratic fit: y = 876.01 - 53.24t + 22.46t2 R = 0.985 Cubic fit: y = 825.49 + 55.64t - 13.63t2+ 3.01t3 R = 0.991 Frequency (Hz) t Normalised time
DA on polynomial coefficents • Quadratic 3 formants x 3 coefficients = 9 predictors • Cubic3 formants x 4 coefficients = 12 predictors • Cubic + duration of /a/ 12 + 1 = 13 predictors
Comparison of Classification Rates % Correct Classification
Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification
Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification
Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification