1 / 57

Characterisation of individuals’ formant dynamics using polynomial equations

Characterisation of individuals’ formant dynamics using polynomial equations. IAFPA 2006. Kirsty McDougall Department of Linguistics University of Cambridge kem37@cam.ac.uk. Speaker characteristics and static features of speech.

heman
Télécharger la présentation

Characterisation of individuals’ formant dynamics using polynomial equations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterisation of individuals’ formant dynamics using polynomial equations IAFPA 2006 Kirsty McDougallDepartment of LinguisticsUniversity of Cambridge kem37@cam.ac.uk

  2. Speaker characteristics and static features of speech • Most previous research has focussed on static features- instantaneous, average • Straightforward to measure • Natural progression from other research areas – delineation of different languages and language varieties

  3. Speaker characteristics and static features of speech • Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT • Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population  look todynamic (time-varying) features

  4. Dynamic features of speech • More information than static • Reflect movement of a person’s speech organs as well as dimensions- people move in individual ways for skilled motor activities - walking, running, … and speech

  5. Dynamic features of speech • can view speech as achievement of a series of linguistic ‘targets’ • speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways  examine formant frequency dynamics

  6. Formant dynamics Frequency (Hz) Time (s) Time (s) /aɪ/ in ‘bike’ uttered by two male speakers of Australian English

  7. Formant dynamics Frequency (Hz) Time (s) Time (s) 10% 10% /aɪ/ in ‘bike’ uttered by two male speakers of Australian English

  8. Formant dynamics Frequency (Hz) Time (s) Time (s) /aɪ/ in ‘bike’ uttered by two male speakers of Australian English

  9. Research Questions • How do speakers’ formant dynamics reflect individual differences in the production of the sequence //? • How can this dynamic information be captured to characterise individual speakers?

  10. bike hike like mike spike /baIk/ /haIk/ /laIk/ /maIk/ /spaIk/ /aIk/ Target words:

  11. Data set e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now. 5 repetitions x 5 words (bike, hike, like, mike, spike) x 2 stress levels (nuclear, non-nuclear) x 2 speaking rates (normal, fast) = 100 tokens per subject

  12. Subjects • 5 adult male native speakers of Australian English (A, B, C, D, E) • aged 22-28 • Brisbane/Gold Coast, Queensland

  13. Speaker A “bike” (normal-nuclear)

  14. Speaker A “bike” (normal-nuclear) 1 2

  15. Speaker A “bike” (normal-nuclear) 10 20 30 40 50 60 70 80 90% 1 2

  16. Speaker A “bike” (normal-nuclear) F3 F2 F1 •  F3 • F2 •  F1 10 20 30 40 50 60 70 80 90% 1 2

  17. F1 normal-nuclear Frequency (Hz) +10% step of /a/

  18. F2 normal-nuclear Frequency (Hz) +10% step of /a/

  19. F3 normal-nuclear Frequency (Hz) +10% step of /a/

  20. Discriminant Analysis Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership (ref. Tabachnick and Fidell 1996)

  21. Discriminant Analysis Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

  22. Discriminant Analysis Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour e.g. Speaker E’s 25 tokens of /aɪk/ fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

  23. Discriminant Analysis DA constructs discriminant functions which maximise differences between speakers(each function is a linear combination of the formant frequency predictors) fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

  24. Discriminant Analysis Assess how well the predictors distinguish speakers by extent of clustering of tokens+ classification percentage… fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

  25. Discriminant Analysis Assess how well the predictors distinguish speakers by extent of clustering of tokens+ classification percentage…95% fast-nuclear 6 4 2 A B C D E Function 2 0 -2 -4 -6 -4 -2 0 2 4 6 Function 1

  26. Discriminant Analysis 95% 89% 95% 88%

  27. Discussion • DA scatterplots and classification rates promising • However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information • Recall: individuals’ F1 contours of /aɪk/…

  28. F1 normal-nuclear Frequency (Hz) +10% step of /a/

  29. A new approach… • Differences in location in frequency range • Differences in curvature – location of turning points, convex/concave, steep/shallow • Need to capture most defining aspects of the contours efficiently  linear regression to parameterise curves with polynomial equations

  30. Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x

  31. Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x

  32. Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y x

  33. Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x x

  34. Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x y-intercept x

  35. Linear regression • Technique for determining equation of a line or curve which approximates the relationship between a set of (x, y) points y y = a0 + a1x y-intercept gradient x

  36. Linear regression • Can also be used for curvilinear relationships y x

  37. Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 x

  38. Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 y-intercept x

  39. Linear regression • Can also be used for curvilinear relationships y quadratic: y = a0 + a1x + a2x2 y-intercept determine shape and direction of curve x

  40. Polynomial Equations y Cubic y = a0 + a1x + a2x2 + a3x3 Quartic y = a0 + a1x + a2x2 + a3x3 + a4x4 Quintic y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5 x y x y x

  41. Polynomial Equations y Cubic y = a0 + a1x + a2x2 + a3x3 Quartic y = a0 + a1x + a2x2 + a3x3 + a4x4 Quintic y = a0 + a1x + a2x2 + a3x3+ a4x4 + a5x5 x y x y x

  42. /ak/data • fit F1, F2, F3 contours with polynomial equations • test the reliability of the polynomial coefficients in distinguishing speakers Quadratic: y = a0 + a1t + a2t2 Cubic: y = a0 + a1t + a2t2 + a3t3

  43. “bike”, Speaker A (normal-nuclear token 1) F1 contour actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t2 Cubic fit: y = 478.85 - 46.07t + 35.62t2 - 3.46t3 y Frequency (Hz) t Normalised time

  44. “bike”, Speaker A (normal-nuclear token 1) F1 contour actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t2 R = 0.879 Cubic fit: y = 478.85 - 46.07t + 35.62t2 - 3.46t3 R = 0.978 y Frequency (Hz) t Normalised time

  45. “bike”, Speaker A (normal-nuclear token 1) F2 contour y actual data points Quadratic fit: y = 876.01 - 53.24t + 22.46t2 R = 0.985 Cubic fit: y = 825.49 + 55.64t - 13.63t2+ 3.01t3 R = 0.991 Frequency (Hz) t Normalised time

  46. DA on polynomial coefficents • Quadratic 3 formants x 3 coefficients = 9 predictors • Cubic3 formants x 4 coefficients = 12 predictors • Cubic + duration of /a/ 12 + 1 = 13 predictors

  47. Comparison of Classification Rates % Correct Classification

  48. Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification

  49. Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification

  50. Comparison of Classification Rates No. of predictors: (9) (12) (13) (20) % Correct Classification

More Related