Dynamic Formant Dynamics and Individual Speaker Characterization

Characterisation of individuals’ formant dynamics using polynomial equations IAFPA 2006 Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge kem37@cam.ac.uk

Speaker characteristics and static features of speech • Most previous research has focussed on static features- instantaneous, average • Straightforward to measure • Natural progression from other research areas – delineation of different languages and language varieties

Speaker characteristics and static features of speech • Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT • Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population  look todynamic (time-varying) features

Dynamic features of speech • More information than static • Reflect movement of a person’s speech organs as well as dimensions- people move in individual ways for skilled motor activities - walking, running, … and speech

Dynamic features of speech • can view speech as achievement of a series of linguistic ‘targets’ • speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways  examine formant frequency dynamics

Formant dynamics Frequency (Hz) Time (s) Time (s) /aɪ/ in ‘bike’ uttered by two male speakers of Australian English

Formant dynamics Frequency (Hz) Time (s) Time (s) 10% 10% /aɪ/ in ‘bike’ uttered by two male speakers of Australian English

Formant dynamics Frequency (Hz) Time (s) Time (s) /aɪ/ in ‘bike’ uttered by two male speakers of Australian English

Research Questions • How do speakers’ formant dynamics reflect individual differences in the production of the sequence //? • How can this dynamic information be captured to characterise individual speakers?

bike hike like mike spike /baIk/ /haIk/ /laIk/ /maIk/ /spaIk/ /aIk/ Target words:

Data set e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now. 5 repetitions x 5 words (bike, hike, like, mike, spike) x 2 stress levels (nuclear, non-nuclear) x 2 speaking rates (normal, fast) = 100 tokens per subject

Subjects • 5 adult male native speakers of Australian English (A, B, C, D, E) • aged 22-28 • Brisbane/Gold Coast, Queensland

Speaker A “bike” (normal-nuclear)

Speaker A “bike” (normal-nuclear) 1 2

Speaker A “bike” (normal-nuclear) 10 20 30 40 50 60 70 80 90% 1 2

Speaker A “bike” (normal-nuclear) F3 F2 F1 •  F3 • F2 •  F1 10 20 30 40 50 60 70 80 90% 1 2

F1 normal-nuclear Frequency (Hz) +10% step of /a/

Discriminant Analysis Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership (ref. Tabachnick and Fidell 1996)

Discriminant Analysis Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

Discriminant Analysis Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour e.g. Speaker E’s 25 tokens of /aɪk/ fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

Discriminant Analysis DA constructs discriminant functions which maximise differences between speakers(each function is a linear combination of the formant frequency predictors) fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

Discriminant Analysis Assess how well the predictors distinguish speakers by extent of clustering of tokens+ classification percentage… fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

Discriminant Analysis Assess how well the predictors distinguish speakers by extent of clustering of tokens+ classification percentage…95% fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

Discriminant Analysis 95% 89% 95% 88%

Discussion • DA scatterplots and classification rates promising • However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information • Recall: individuals’ F1 contours of /aɪk/…

A new approach… • Differences in location in frequency range • Differences in curvature – location of turning points, convex/concave, steep/shallow • Need to capture most defining aspects of the contours efficiently  linear regression to parameterise curves with polynomial equations

Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x

Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x x

Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x y-intercept x

Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x y-intercept gradient x

Linear regression • Can also be used for curvilinear relationships y x

Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 x

Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 y-intercept x

Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 y-intercept determine shape and direction of curve x

Polynomial Equations y Cubic y = a0 + a1x + a2x2 + a3x3 Quartic y = a0 + a1x + a2x2 + a3x3 + a4x4 Quintic y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5 x y x y x

/ak/data • fit F1, F2, F3 contours with polynomial equations • test the reliability of the polynomial coefficients in distinguishing speakers Quadratic: y = a0 + a1t + a2t2 Cubic: y = a0 + a1t + a2t2 + a3t3

“bike”, Speaker A (normal-nuclear token 1) F1 contour actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t2 Cubic fit: y = 478.85 - 46.07t + 35.62t2 - 3.46t3 y Frequency (Hz) t Normalised time

“bike”, Speaker A (normal-nuclear token 1) F1 contour actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t2 R = 0.879 Cubic fit: y = 478.85 - 46.07t + 35.62t2 - 3.46t3 R = 0.978 y Frequency (Hz) t Normalised time

“bike”, Speaker A (normal-nuclear token 1) F2 contour y actual data points Quadratic fit: y = 876.01 - 53.24t + 22.46t2 R = 0.985 Cubic fit: y = 825.49 + 55.64t - 13.63t2+ 3.01t3 R = 0.991 Frequency (Hz) t Normalised time

DA on polynomial coefficents • Quadratic 3 formants x 3 coefficients = 9 predictors • Cubic3 formants x 4 coefficients = 12 predictors • Cubic + duration of /a/ 12 + 1 = 13 predictors

Comparison of Classification Rates % Correct Classification

Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification

Dynamic Formant Dynamics and Individual Speaker Characterization

Dynamic Formant Dynamics and Individual Speaker Characterization

Presentation Transcript

Lecture 18 - Eulerian Flow Modeling Applied Computational Fluid Dynamics

Lecture 8 - Turbulence Applied Computational Fluid Dynamics

Chapter 3: Systems of Linear Equations

PARAMETRIC EQUATIONS AND POLAR COORDINATES

Chapter 3

A FASTER STRONGLY POLYNOMIAL MINIMUM COST FLOW ALGORITHM JAMES B. ORLIN

Algorithm Design and Analysis (ADA)

Dynamics and Statics

Inverse Dynamics

POLYNOMIALS

The Multiple Regression Model

SECOND-ORDER DIFFERENTIAL EQUATIONS

CLASSIFYING POLYNOMIALS

SECOND-ORDER DIFFERENTIAL EQUATIONS

Lecture 1- Introduction to Vehicle Dynamics

Equations: Linear and Systems

Algebra

Chapter 9: Systems of Equations and Inequalities; Matrices

Unit 6 – Chapter 9

Chapter 3

Higher-Order Differential Equations

Characterisation of porous materials using Electron Microscopy