Pavel Skrelin (Saint-Petersburg State University)

Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo

My main principle: • Acoustic data that we retrieve from speech material for analysis should be connected with phonetic (linguistic) features, hence obtained values should reflect concrete features and have clear phonetic (linguistic) interpretation; used methods of calculations and classifications should take into account not only speech production but speech perception properties too.

Fo measurements (phrase № 53 in reading and spontaneous speech, F>40)

Terms: • Smoothing: Fo data is processed by rectangular window of 100 ms long with pitch-synchronous shift • Correction: pitch marks are eliminated on voiced consonants + approximants, on voiced onsets, hesitations and voiced transitions between vowels and consonants.

Why the correction is needed on voiced transitions between vowels and consonants This voiced transition does not affect the perceived vowel duration: The whole group [kak'i tak'i] • [kaki] isolated -- original vowel length -- vowel without transition • [i] -- original length (66 ms) -- vowel without transition(37ms) but affects the next consonant duration and Fo values:

Smoothed Fo data with pitch marks on voiced [i-t] transition Smoothed Fo data without pitch marks on voiced [i-t] transition

Raw data: reading Raw data: spontaneous speech

Smoothed data: reading • Smoothed data: spontaneous speech

Smoothed Fo without laryngealization: reading Smoothed Fo without laryngealization: spontaneous speech

Smoothed Fo without laryngealization and some consonants: reading Smoothed Fo without laryngealization and some consonants: spont. speech

Fo measurements

Tempo measurements • Methods may be different for different tasks: • For Comparison on the basis of the whole material • For tempo monitoring, for example for revealing tempo modification specific for some IU types or IU position in the utterance or for local tempo comparison between read and spontaneous realizations of the same phrase

Tempo measurements Comparison on the basis of the whole material • Syllables: Average Duration of Syllables realized in Spont. Speech vs Average Duration of Syllables realized in Reading Example for F>40 152/143 = 1.06 • Sounds: Average Duration of Sounds realized in Spont. Speech vs Average Duration of Sounds realized in Reading Example for F>40 67/63 = 1.06 Possible correction - taking into account the ideal number of syllables or sounds

The simplest way: direct comparison of sound duration in both phrases But some sounds are longer in reading, others – in spontaneous speech, it makes the tempo comparison difficult and inconsistent.

Tempo measurements Tempo monitoring: Example (Speaker F<20: phrase №12, sounds duration in spontaneous speech and reading)

Methods for tempo monitoring 1. Current syllable duration/average syllable duration: current syllable duration = IU duration/number of syllables; average syllable duration = net sound material duration/number of syllables Not good because the result depends on syllable structures in the current IU, so it needs use of some normalization taking into account the average syllable structure (C/V coefficient) and current one.

Methods for tempo monitoring 2. Average sound duration in current IU/average sound duration average sound duration in current IU = IU duration/number of sounds; average sound duration = net sound material duration/number of sounds With possible correction - taking into account the ideal number of sounds in the whole material and in the current IU Example for F<20 • Reading 1-st IU 59/64 = 0.92 2-nd IU 61/64 = 0.95 • Spont. Speech 1-st IU 70/71 = 0.99 2-nd IU 54/71 = 0.76 Not good because the result does not take into account individual average durations of each sound in the IU and deviations of current duration of each sound in the IU from its average duration in the whole material.

Methods for tempo monitoring 3. Average sound duration in current IU/averaged sound duration in the IU average sound duration in current IU = IU duration/number of sounds; averaged sounds duration = sum of average sound durations (on the basis of the whole material) in the IU/ number of sounds in the IU (some pictures)

Or the same in better view

Methods for tempo monitoring 3. Average sound duration in current IU/averaged sounds duration in the IU average sound duration in current IU = IU duration/number of sounds; averaged sounds duration = sum of average sounds durations (on the basis of the whole material) / number of sounds Example for F<20 • Reading 1-st IU 59/72 = 0.82 2-nd IU 61/56 = 1.09 • Spont. Speech 1-st IU 70/75 = 0.93 2-nd IU 54/66 = 0.82 With possible correction - taking into account the ideal number of sounds in the whole material and in the current IU and average durations of pre-stressed and post-stressed vowels

Methods for tempo monitoring 4. Rob van Son proposal (Z-values): • "As Finnish and Dutch (and Russian?) use quantities on (some) phonemes, this is not a good way to define tempo. We had a PhD student (Xue Wang) who developed a very nice way to define "local" tempo as the Z value of the phoneme (i.e., LocalTempo = (PhonemeDuration - MeanPhonemeDuration)/StandDeviation for each phoneme). • The local speaking rate is then the mean of these values over an utterance." Example for F<20 • Reading 1-st IU -0.39 2-nd IU 0.26 • Spont. Speech 1-st IU -0.12 2-nd IU -0.41 No comprehensible relation between values and linguistic features

Method comparison

Pavel Skrelin (Saint-Petersburg State University)