Pitch Tracking + Prosody

Pitch Tracking + Prosody January 20, 2009

The Plan for Today • One announcement: • On Thursday, we’ll meet in the Tri-Faculty Computer Lab (SS 018) • Section 1 • We’ll be working on intonation transcription… • Automatic Pitch Tracking • (Brief) suprasegmentals review • The basics of English intonation

The Thin Blue Line • Praat can give us a representation of speech that looks like: • The blue line represents the fundamental frequency (F0) of the speaker’s voice. • Also known as a pitch track • How can we automatically “track” F0 in a sample of speech?

Pitch Tracking • Voicing: • Air flow through vocal folds • Rapid opening and closing due to Bernoulli Effect • Each cycle sends an acoustic shockwave through the vocal tract • …which takes the form of a complex wave. • The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.

Voicing Bars

Voicing Bars Individual glottal pulses

Voicing = Complex Wave • Note: voicing is not perfectly periodic. • …always some random variation from one cycle to the next. • How can we measure the fundamental frequency of a complex wave?

duration = ??? • The basic idea: figure out the period between successive cycles of the complex wave. • Fundamental frequency = 1 / period

Measuring F0 • To figure out where one cycle ends and the next begins… • The basic idea is to find how well successive “chunks” of a waveform match up with each other. • One period = the length of the chunk that matches up best with the next chunk. • Automatic Pitch Tracking parameters to think about: • Window size (i.e., chunk size) • Step size • Frequency range (= period range)

Window (Chunk) Size Here’s an example of a small window

Window (Chunk) Size Here’s an example of a large(r) window

Initial window of the waveform is compared to another window (of the same size) at a later point in the waveform

Matching ??? The waveforms in the two windows are compared to see how well they match up. Correlation = measure of how well the two windows match

Autocorrelation • The measure of correlation = • Sum of the point-by-point products of the two chunks. • The technical name for this is autocorrelation… • because two parts of the same wave are being matched up against each other.

Autocorrelation Example • Ex: consider window x, with n samples… • What’s its correlation with window y? • (Note: window y must also have n samples) • x1 = first sample of window x • x2 = second sample of window x • … • xn = nth (final) sample of window x • y1 = first sample of window y, etc. • Correlation (R) = x1*y1 + x2+ y2 + … + xn* yn • The larger R is, the better the correlation.

By the Numbers • Sample 1 2 3 4 5 6 • x .8 .3 -.2 -.5 .4 .8 • y -.3 -.1 .1 .3 .1 -.1 • product -.24 -.03 -.02 -.15 .04 -.08 • Sum of products = -.48 • These two chunks are poorly correlated with each other.

By the Numbers, part 2 • Sample 1 2 3 4 5 6 • x .8 .3 -.2 -.5 .4 .8 • y .7 .4 -.1 -.4 .1 .4 • product .56 .12 .02 .2 .04 .32 • Sum of products = 1.26 • These two chunks are well correlated with each other. • (or at least better than the previous pair) • Note: matching peaks count for more than matches close to 0.

Back to (Digital) Reality ??? These two windows are poorly correlated The waveforms in the two windows are compared to see how well they match up. Correlation = measure of how well the two windows match

Next: the pitch tracking algorithm moves further down the waveform and grabs a new window

“step” The distance the algorithm moves forward in the waveform is called the step size

Matching, again ??? The next window gets compared to the original.

Matching, again ??? These two windows are also poorly correlated The next window gets compared to the original.

another “step” The algorithm keeps chugging and, eventually…

Matching, again ??? These two windows are highly correlated The best match is found.

period The fundamental period can be determined by the calculating the length of time between window 1 and window 2.

Mopping up period • Frequency is 1 / period • Q: How many possible periods does the algorithm need to check? • Frequency range (default in Praat: 75 to 600 Hz)

Moving on • Another comparison window is selected and the whole process starts over again.

The algorithm ultimately spits out a pitch track. • This one shows you the F0 value at each step. would I like Uhm A flight to Seattle from Albuquerque Thanks to Chilin Shih for making these materials available

Pitch Tracking in Praat • Play with F0 range. • Create Pitch Object. • Also go To Manipulation…Pitch. • Also check out:

Summing Up • Pitch tracking uses three parameters • Window size • Ensures reliability • In Praat, the window size is always three times the longest possible period. • E.g.: 3 X 1/75 = .04 sec. • Step size • For temporal accuracy • Frequency range • Reduces computational load

Deep Thought Questions • What might happen if: • The shortest period checked is longer than the fundamental period? • AND two fundamental periods fit inside a window? • Potential Problem #1: Pitch Halving • The pitch tracker thinks the fundamental period is twice as long as it is in reality. •  It estimates F0 to be half of its actual value

Pitch Halving pitch is halved Check out normal file in Praat.

More Deep Thoughts • What might happen if: • The shortest period checked is less than half of the fundamental period? • AND the second half of the fundamental cycle is very similar to the first? • Potential Problem #2: Pitch doubling • The pitch tracker thinks the fundamental period is half as long as it actually is. •  It estimates the F0 to be twice as high as it is in reality.

Pitch Doubling pitch is doubled

Microperturbations • Another problem: • Speech waveforms are partly shaped by the type of segment being produced. • Pitch tracking can become erratic at the juncture of two segments. • In particular: • voiced to voiceless segments • sonorants to obstruents • These discontinuities in F0 are known as microperturbations. • Also: transitions between modal and creaky voicing tend to be problematic.

Back to Language • F0 is important because it can be used by languages to signal differences in meaning. • Note: • Acoustic = Fundamental Frequency • Perceptual = Pitch • Linguistic = Tone

A Typology • F0 is generally used in three different ways in language: 1. Tone languages (Chinese, Navajo, Igbo) • Lexically determined tone on every syllable • “Syllable-based” tone languages 2. Accentual languages (Japanese, Swedish) • The location of an accent in a particular word is lexically marked. • “Word-based” tone languages 3. Stress languages (English, Russian) • It’s complicated.

Mandarin Tone • Mandarin (Chinese) is a classic example of a tone language. ma1: mother ma2: hemp ma3: horse ma4: to scold

How to Transcribe Tone • Tones are defined by the pattern they make through a speaker’s frequency range. • The frequency range is usually assumed to encompass five levels (1-5). • (although this can vary, depending on the language) Highest F0 5 4 3 2 Lowest F0 1

Tone 1 2 3 4 • In Mandarin, tones span a frequency range of 1-5 • Each tone is denoted by its (numerical) path through the frequency range • Each syllable can also be labeled with a tone number (e.g., ma1, ma2, ma3, ma4)

How to Transcribe Tone • Tone is relative • i.e., not absolute • Each speaker has a unique frequency range. For example: Male Female 200 Hz Highest F0 5 350 Hz 4 3 2 100 Hz Lowest F0 150 Hz 1

General Relativity • In ordinary conversation, for European languages (Fant, 1956) : • Men have an average F0 of 120 Hz • A range of 50-250 Hz • Women have an average F0 of 220 Hz • A range of 120-480 Hz • Children have an average F0 of 330 Hz • In a normal utterance, the F0 range is usually one octave. • i.e., highest F0 = 2 * lowest F0

Relativity, in Reality • The same tones may be denoted by completely different frequencies, depending on the speaker. • Tone is an abstract linguistic unit. female speaker ma, tone 1 (55) male speaker

Accent Languages • In accent languages, there is only one pitch accent associated with each word. • The pitch accent is realized on only one syllable in the word. • The other syllables in the word can have no accent. • Accent is lexically determined, so there can be minimal pairs. • Japanese is a pitch accent language… • for some, but not all, words • for some, but not all, dialects

Japanese • Japanese words have one High accent • it attaches to one “mora” in the word • A mora = a vowel, or a consonant following a vowel, within a syllable. • For example: • [ni] ‘two’ has one mora. • [san] ‘three’ has two morae. • The first mora, if not accented, has a Low F0. • Morae following the accent have Low F0. It’s actually slightly more complicated than this; for more info, see: http://sp.cis.iwate-u.ac.jp/sp/lesson/j/doc/accent.html

Japanese Examples • asa ‘morning’ H-L • asa ‘hemp’ L-H

“chopsticks” H-L-L • “bridge” L-H-L • “edge” L-H-H

Stress Languages • Stress is a suprasegmental property that applies to whole syllables. • It is defined by more than just differences in F0. • Stressed syllables are higher in pitch (usually) • Stressed syllables are longer (usually) • Stressed syllables are louder (usually) • Stressed syllables reflect more phonetic effort. • More aspiration, less coarticulation in stressed syllables. • Vowels often reduce to schwa in unstressed syllables. • The combination of these factors give stressed syllables more prominence than unstressed syllables.

Stress: Pitch • (N) • (V) Complicating factor: pitch tends to drift downwards at the end of utterances

Intonation • Languages superimpose pitch contours on top of word-based stress or tone distinctions. • This is called intonation. • It turns out that English: • has word-based stress • and phrase-based pitch accents (intonation) • The pitch accents are pragmatically specified, rather than lexically specified. • =they change according to discourse context.

Pitch Tracking + Prosody

Pitch Tracking + Prosody

Presentation Transcript

Pitch

Pitch

A Robust Algorithm for Pitch Tracking

Pitch

Pitch

Pitch

PITCH!

Song-level Multi-pitch Tracking by Heavily Constrained Clustering

PITCH

Pitch

Harmonically Informed Multi-pitch Tracking

Overview of Real-Time Pitch Tracking Approaches

Pitch

Pitch Tracking ( 音高追蹤 )

Pitch Tracking ( 音高追蹤 )

Automatic Pitch Tracking

Pitch

Automatic Pitch Tracking

Harmonically Informed Multi-pitch Tracking

Pre and Post-Processing for Pitch Tracking

Pitch

Pitch Tracking in Time Domain