1 / 38

Phonetic details in prosodic phenomena

Phonetic details in prosodic phenomena. Oliver Niebuhr Presentation at the Laboratoire de Phonétique et Phonologie, Paris 3 January, 30th, 2009 oliver.niebuhr@lpl-aix.fr. Summary of German intonation (in terms of KIM).

erv
Télécharger la présentation

Phonetic details in prosodic phenomena

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phonetic details in prosodic phenomena Oliver Niebuhr Presentation at the Laboratoire de Phonétique et Phonologie, Paris 3 January, 30th, 2009 oliver.niebuhr@lpl-aix.fr

  2. Summary of German intonation (in terms of KIM) A number of attitudinal meanings are known to be signalled in German by prosodic means The speaker can convey that an information is (a) “settled”, “concluding” (b) “presenting”, “open(ing)” (c) “astonishing” Furthermore, speaker can (d) superordinate or (e) subordinate her/himself to the dialogue partner The specific interpretations of these attitudinal meanings may vary depending on the semantic composition and the linguistic structure of the utterance In negative contexts (a) may be interpreted as “resignation” and (c) as “disbelieving / taken aback” (d) and (e) can convey ‘statement’ and ‘question’

  3. Summary of German intonation (in terms of KIM) In the Kiel Intonation Model (KIM, Kohler 1991), these attitudinal meanings have been assigned to pitch-accent categories The KIM distinguishes between 2 basic phonological classes of pitch movements rising-falling peaks G (falling)-rising valleys Co-occur with accented (i.e. perceptually salient) syllables Timing is phonological, not phonetic ( alignment)  synchronization Relevant according to Kohler (1987, 1991): F0 maximum (peaks) or minimum (valleys) relative to the accented-vowel boundaries (onsets)

  4. ?! ? Summary of German intonation (in terms of KIM) ‘early’ peaks  F0 max. before acc.Von = “settled”, “concl.” ‘medial’ peaks  F0 max. after acc.Von = “presenting” ‘late’ peaks  F0 max. after acc.Voff = “astonishing” ‘early/late’ valleys  F0 min. before/after acc.Von = “subordination” /  “questioning” Did you hear me Answer: hmm…not exactly. Answer: yes, sure.

  5. Gradual change in perception ‘medial‘  ‘late‘ around acc.Voff Categorical change in perception ‘early‘  ‘medial‘ The signalling of early, medial, and late peaks The reference to the accented-vowel boundaries in the KIM originates from peak-shift experiments by Kohler (1987, 1991). On the other hand, his experiments showed that the location of the category boundary is shifted for different stimulus utterances (Kohler 1991) “Sie hat ja gelogen” = lateral + vowel “Sie ist ja geritten” = fricative + vowel “Sie hat ja gejodelt” = approximant + vowel Why?  earlier boundary  later boundary

  6. The signalling of early, medial, and late peaks Niebuhr (2006, 2007c): maybe, it is not the segment boundary between C+V in terms of a spectral change (e.g., formant transitions) that matters, but the increasing / decreasing intensity into and out of the accented vowel The intensity change is more abrupt for sequences nasal+vowel or lateral+vowel than for approximant+vowel. It also depends on the vowel quality itself. Starting from this idea, Two f0 peak shift series were resynthesized One using the stimulus utterance “Sie war mal Malerin” The other hand keeps exactly the F0 and intensity patterns of the “Malerin” series, but on a constant Schwa-like vowel quality (= “HUM” in ‘praat’) So, basically, the two stimulus series (“Malerin” and “HUM”) differ just with regard to the presence / absence of the segmental string. Two parallel perception experiments with two separate groups of subjects Indirect identification for “Malerin” series AXB test for “HUM” series

  7. Constant lexical string Constant intonation Constant lexical string Variable intonation (iv) Matching? meaning meaning Hearer / native speaker The signalling of early, medial, and late peaks Indirect identification of inton. categories via meaning Context Stimulus Test Stimuli

  8. “Malerin“ series The signalling of early, medial, and late peaks

  9. So, what happens, if we manipulate the rising slope of the intensity curve that reflects the CV transition? The signalling of early, medial, and late peaks

  10. Again two perception experiments, based on (a) “Malerin“ and (b) “HUM“ stimulus series. The signalling of early, medial, and late peaks

  11. The signalling of early, medial, and late peaks “Malerin“ series The dynamics of the perceptual change from ‘early’ to ‘medial’ decreases with decreasing dynamics of the underlying intensity change. The same effect shows up, if a less pointed F0 peak is shifted. A comparable effect of the dynamics of the F0 and intensity courses on the pitch-accent perception can be found for peak-shift series from ‘medial’ to ‘late’, based on a manipulation of the decreasing intensity at the VC boundary

  12. The signalling of early, medial, and late peaks Conclusions of Niebuhr (2006, 2007c): The picture sketched by Kohler (1987, 1991) must be refined The abruptness of the perceptual changes between ‘early’, ‘medial’, and ‘late’ is not determined by the categories themselves. The change from ‘early’ to ‘medial’ can be turned into a gradualone  The change from ‘medial’ to ‘late’ can be made categorical The signalling of ‘early’, ‘medial’, and ‘late’ is based on an interplay between F0 and intensity changes (or levels).  The findings support the central claim of the KIM that the synchronization of the (rising-falling) F0-peak contour relative to the vowel boundaries is decisive for the pitch-accent identification The findings can explain to some extent, why different alignment patterns of the F0 peaks are found for different structures and segmentalcompositions of the accent syllable and the adjacent syllables. They also make sense in terms of articulatory anchoring.

  13. The signalling of early, medial, and late peaks

  14. [j] vs. [m] Okay The signalling of early, medial, and late peaks mmm

  15. Impetus for looking at phonetic details (allophonic variations) of fricative and vowel segments in contexts of different intonation categories The signalling of early, medial, and late peaks  Implication: Synchronizationitself isnot phonological; it is an effective, economic tool that speakers can use to highlight different parts of the pitch pattern by making use of the intensity pattern that results from the segmental string Alternative, complementary strategies: changing the overall peak shape, changing intensity levels, changing segment durations andarticulations (e.g., openness of the vowel, sibilant / intrinsic pitch)

  16. Eine Male rin Eine Male rin The signalling of early, medial, and late peaks Intonational difference in focus: phrase-final (nuclear) high-rising valleys vs. terminal falling peaks Segments (Phonemes) in focus: Phrase-final // Like /t/ aspiration, it can create “sibilant pitch” and is therefore particularly likely to vary systematically according to the peak-valley difference. Moreover: // is realized in German with rounding []; this rounding does typically characterize already the preceding vowel → e.g., “Tisch” = [t]. So, is // in the context of a high-rising valley lighter than in the context of a terminal falling peak? Is rounding involved in this effect? If so, then the effect should already be noticeable in the vowel preceding //.

  17. Eine Male rin Eine Male rin The signalling of early, medial, and late peaks Intonational difference in focus: phrase-final (nuclear) high-rising valleys vs. terminal falling peaks Segments (Phonemes) in focus: Phrase-final /x/ It is also able to convey pitch by means of changes in the energy pattern of the noise spectrum Moreover: following /u/, /x/ is realized as a rounded velar fricative [x]. So, is /x/ in the context of a high-rising valley lighter than in the context of a terminal falling peak? Is rounding involved in this effect? If so, then the effect should already be noticeable in the vowel /u/ preceding /x/.

  18. Eine Male rin Eine Male rin The signalling of early, medial, and late peaks Intonational difference in focus: phrase-final (nuclear) high-rising valleys vs. terminal falling peaks Segments (Phonemes) in focus: Phrase-final <-er> word endings It is realized as a vocoid sound [], which is known to show considerable dialectal variation between [] and [] So, is it also influenced by the coinciding intonation categories? If so, is it lighter (more fronted and/or open) in connection with high-rising valleys? Phrase-final // Its phonetic quality is known to be strongly context dependent in German So, is // lighter (more fronted and/or open) in connection with high-rising valleys?

  19. The signalling of early, medial, and late peaks } “sibilant pitch“ } “intrinsic } vowel pitch“ Pairs of target words (increases the ‘n’): “Tisch”, “Fisch”→ // with preceding // “Buch”, “Tuch”→ /x/ with preceding /u/ “lecker”, “Bäcker” → <-er> realized as [] “Tage”, “Schramme” → // Placed sentence- and hence phrase-finally in contexts of high-rising valleys and terminal falling peaks Acoustic analysis, including F2 (based on LPC LTAS) at three points in the vowels: 20ms after onset, centre, 20ms before offset Centre-of-Gravity (CoG)determined every 7ms in the fricatives and then averaged across the whole fricative segment. Segment durations of vowels and fricatives

  20. The signalling of early, medial, and late peaks Corpus recorded with quasi-spontaneous, informal sounding speech, using an improved method of Kohler und Niebuhr (2007) This means: Written dialogue texts with informal, everyday contents/situations. Target words are integrated sentence-finally without highlighting. The high-rising valleys and terminal falling peaks as well as the corresponding accented syllables are elicited solely by creating appropriate semantic-pragmatic contexts. Dialogues were produced by good friends. They were allowed to modify the texts according to their own way of speaking (e.g., by introducing or exchanging words and particles). One of the speakers was the experimenter (me); he tried to guide the subject with regard to speaking style, and his productions were part of the pragmatic context. Every dialogue was produced 4 times in a row, and only the last two productions were used for the acoustic analysis. So far, 5 subjectshave been recorded(→ n=20), 10 are planned.

  21. peak valley The signalling of early, medial, and late peaks Results for “Tisch” and “Fisch” The sibilant // is considerably lighter in the contexts of the high-rising valley. This is reflected in significantly different mean CoG values. The fricative durations, however, do not differ significantly. Also the productions of // differ depending on the intonation context. That is, F2 (middle) is significantly higherafter the high-rising valley. Supported by perceptual analysis, this effect involves de-rounding.

  22. peak valley The signalling of early, medial, and late peaks peak Results for “Buch” and “Tuch” The fricative /x/ is considerably lighter in the contexts of the high-rising valley. This is reflected in significantly different mean CoG values. The fricative durations, however, do not differ significantly. Also the productions of /u/ differ slightly in the way that the F2 (middle) tends to be higherafter the high-rising valley. Supported by perceptual analysis, this is again due to de-rounding. valley

  23. The signalling of early, medial, and late peaks Results for “lecker” and “Bäcker” The vocoid realizations of the word ending <-er> are not consistently lighter in connection with the high-rising valley. Such an effect – i.e. a higher F2 – can only be observed in tendency towards the end of the sound.  in connection with the high-rising valley <-er> becomes a diphthongized [], which also tends to be longer than in connection of the terminal falling peak. peak valley 650 1650 700 1300

  24. 600 1200 650 1750 The signalling of early, medial, and late peaks Results for “Tage” and “Schramme” The productions of // are lighter, i.e. they show a significantly higher F2at the centre and towards the end of the vowel in connection with the high-rising valley. However, the vowel durations do not differ significantly depending on the intonational context. peak valley

  25. Intonation Lexemes and Phonemes Summary of the intonational part • We know that the F0 course contributes to the coding of segments • F0 rise or fall before/after obstruents is a fortis-lenis cue (Kohler 1979) • F0 relative to F1 determines the vowel quality (Traunmüller 1985) • Position of F0 turning points cues word boundaries (D‘Imperio 2000,Petrone 2008) • (...)

  26. ! Summary of the intonational part Intonation Lexeme, Phoneme, Phone • We need not forget that the segmental string is not (just) a „troublemaker“ for the • coding of intonational units • The segmental contribution can the of two different kinds: • direct: e.g., by intensity-based highlighting of parts of the F0 course, by conveying • different sibilant or intrinsic (i.e. vowel-based) pitches • indirect: e.g., by articulatory metaphers of attitudinal meanings, i.e. long, soft, and • light articulations for astonishment vs. short, loud, dark articulations for conclusions • or superordinations, etc.

  27. Place assimilation in French sibilant sequences Based on a recent investigation of Niebuhr et al. (2008) within the S2S research network Starting from a corpus of read speech that comprised 72 sentences that may be subdivided into three subsets (1) the 8 possible sibilant sequences across word boundaries that result from the cross-combination of the features 'alveolar', 'postalveolar' and 'voiced', 'voiceless', i.e. (a) /s/, (b) /s/, (c) /z/, (d) /z/, (e) /s/, (f) /s/, (g) /z/, and (h) /z/  placed in the symmetrical vowel contexts /i/___/i/, /a/___/a/, and /u/___/u/ (= 24). (2) (a) /s/, (b) /s/, (c) /z/, (d) /z/ framed by the 6 asymmetrical vowel contexts (= 24). (3) each of the 4 individual sibilants /s/, //, /z/, and // paired across word boundaries with a labial consonant (C) like /p/ or /v/ in the two possible orders __C and C__ and framed by symmetrical vowel contexts  reference qualities for the sibilants (=24). All 72 sentences were read 4 times in a randomized order by 4 female native speakers of French

  28. Place assimilation in French sibilant sequences Based on a recent investigation of Niebuhr et al. (2008) within the S2S research network Starting from a corpus of read speech that comprised 72 sentences that may be subdivided into three subsets (1) /usu/: “Tu te couches sous l'drap” (2) /asa/: “C'est une classe chargée” (3) /uzu/: “J'ai vendu douze journaux” (4) /aza/: “C'est une phrase japonaise” (5) /aCa/: “Tu te caches facilement” (6) /izCi/: “Il a une devise vitale” (7) /aCsa/: “Elle tape sa soeur”

  29. Place assimilation in French sibilant sequences Measurements: Spectral: range and mean of the CoG across the whole section Duration of the whole section of the individual sibilants were possible

  30. Place assimilation in French sibilant sequences Results in the temporal domain: If there were two spectrally separable sibilant sections, the postalveolar was always the longer one. Overall, the sequences were around twice as long as the single references, alv.-postalv. in tendency even more than postalv.-alv.

  31. Place assimilation in French sibilant sequences Results in the frequency domain: The alveolar-postalveolar as well as the postalveolar-alveolar sequences were both spectrally shifted in a comparable way towards the postalveolar references.

  32. Place assimilation in French sibilant sequences Conclusions: Place assimilation in French sibilant sequences is a gradual rather than a categorical phenomenon It is feature-determined, not direction-determined. The target is postalveolar. Consequently, it is regressive in alveolar-postalveolar and progressive in postalveolar-alveolar sequences.  “Elle remâche sa viande” : /s/  [:]  “C’est une classe chargée” : /s/  [:] The assimilated alveolar-postalveolar and postalveolar-alveolar sequences can have phonetically identical manifestations in terms of both temporal and spectral values. But are these sequences really ambiguous?

  33. Place assimilation in French sibilant sequences Nolan (1992) found for English that word-final /d/s which were completely assimilated to following word-initial /g/s (in terms of EPG patterns) were still identified as /d/s by his subjects. He ascribed this effect to differences in the preceding vowel. Starting from this interesting observation, which was not further pursued so far, we investigated whether the vowels that preceded the assimilated /s/ and /s/ sibilant sequences in French show differences in phonetic details that can be used by listeners to identify the even those following sibilant sequences as /s/ or /s/ that are ambiguously realized as [:].

  34. a i u Place assimilation in French sibilant sequences This is what we found for vowel duration and vowel intensity: The vowels /a, i, u/ were significantly longer when they preceded /s/ (on average 15-20ms, up to 60ms). The vowels /a, i, u/ were significantly louder when they preceded /s/ (on average 2-3dB, up to 5dB).

  35. Place assimilation in French sibilant sequences And in addition we found for voice quality: The vowels /a, i, u/ were significantly breathier when preceded by /s/. The voice quality was represented by the harmonic ratios H1/H2, based on narrow band DFT spectral at three points in the vowel: 20ms after onset, centre, 20ms before offset. On the other hand, the vowels before /s/ sometimes show a short section of /h/-like friction before the actual sibilant sets in (= different timing of breath?)

  36. 65ms 160ms 21dB vs. 17dB 43ms 160ms Place assimilation in French sibilant sequences /s/ /s/

  37. 57ms 150ms 18dB vs. 16dB 49ms 170ms Place assimilation in French sibilant sequences /s/ /s/

  38. Summary of the assimilation part Initial perceptual tests were done in which (a) just the CV part of the first target syllable and (b) just the sibilant section itself was played to non-naïve native speakers of French. The preliminary results show that the listeners were only able for (a) to predict beyond change level whether the upcoming sibilant sequence was /s/ or /s/. So, the multi-parametricphonetic details in the vowels preceding the sibilant sequences might already be acoustic cues to the phonological make-up of the sibilant sequence, even if the phonetic realization of the latter is by itself ambiguous. This suggests that the term “gradual” has a temporal implication beyond the sibilant sequence itself. Moreover, the fine phonetic differences in the vowels were not found in the same way before single, non-assimilated sibilant. This might be taken as evidence that place assimilation within French sibilant sequences represents a re-organization rather than a (gradual) substitution of phonological features, in line with previous findings of many other assimilation processes across languages.

More Related