Synthesizing naturally produced tokens

Synthesizing naturally produced tokens Melissa Baese-Berk SoundLab 12 April 2009

Why synthesize? • Stimuli need to be on a continuum that co-varies in VOT and formant transitions. • Altering VOT is easy(ish) • Just copy and paste • Altering formants (especially formant transitions) is a lot trickier

What am I synthesizing • Maye and Gerken (M&G) have three continua with 8 tokens each from a single speaker • I have the same, but from two speakers • Chose to synthesize using M&G formants • Pros: • Keeps transitions, VOT, and steady state formants constant across speakers • Does not privilege one speakers formants • Cons: • Lose a lot of speaker specific information when keeping formant transitions stable (a problem when the perception task is ‘Identify the speaker’)

Basic Vowel Re-synthesis • (Much thanks to Joe Toscano for help with this) • Speech sounds (especially vowels) have two parts: the source (glottis) and the filter (mouth). • Basically - the source provides pitch; filter provides formants • We want to separate the source and filter, manipulate the filter to do what we want/need, and then push the source back through the filter to get the target sound.

Basic Vowel Re-synthesis • Separating the source: • LPC (Linear Predictive Coding) removes formant information, leaving you with the source. • To do this in Praat - you have to resample the sound (under Convert) and scale it’s intensity (under Modify - then choose “Scale Peak”). • Then chose LPC (autocorrelation) under Formants and LPC. • Then select the resulting LPC + the original sound, and chose “Filter (inverse)”. • This leaves you with the source.

Basic Vowel Resynthesis • Next we need the filter: • Choose the original sound • Select “To Formant” from the Formant & LPC menu. • You have the filter! • But the filter by itself doesn’t do you much good. • If you pushed the source through the filter now, you’d just get a sort of funny sounding version of the original sound. • We need to manipulate the filter to change it into something new.

Basic Vowel Resynthesis • Let’s say we want to do something easier than change formant transitions • Change ‘hit’ into ‘het’ • Basically, we need to know what F1 and F2 are in the original token (‘hit’) and what we need to change them into (‘het’). • hit • F1: 273 Hz & F2: 2362 Hz • het • F1: 558 Hz & F2: 1815 Hz • So, F1 should increase by about 285 Hz, and F2 should decrease by about 547 Hz.

Basic Vowel Resynthesis • To do this, Praat has a handy formula command. • Choosing the formant object (the filter), you can select a modify command. • Can manipulate by row or column (or probably a lot else using ‘if …’ statements. • But to manipulate the entire F1 and the entire F2, you enter these two separate commands: • if row = 1 then self + 285 else self fi • if row = 2 then self - 547 else self fi • These translate roughly to: If you’re F1, add 285 to yourself; otherwise stay how you are. THEN, if you’re F2, subtract 547 from yourself; otherwise stay how you are. (This leaves F3 & higher intact)

Basic Vowel Resynthesis • Now, you have your filter - and your source. • All you have to do is filter. • Select the formant object and the sound and chose ‘Filter’. • Easy, right?

Manipulating Formant Transitions • In the previous example, we manipulated all of F1 and F2. • What if we only want to manipulate the formants for the first 1/8th or so of the vowel, and keep the rest of the formants intact? • This is both easier and harder. • You can do this two ways: • Extract the portion of the vowel you want to change the formants on, manipulate the slope, and then copy and paste this back into the rest of the vowel (… this results in some clipping usually) • Use a script to manipulate column-by-column (time-slice by time-slice) the F1 and F2. (I chose this way)

Manipulating Formant Transitions • Basically, I use the same formula idea from before, but I make multiple formulas operate on the same formants without manually entering each one. • Calculate the slope in the initial token (stimulus 1) and the final token (stimulus 8) • Calculate the change in slope from initial to final • Calculate how big a change in slope has to take place across the intermediate steps to turn token 1 into token 8. • (Can do all of this by changing the formant object into a table, which can be opened in Excel)

Manipulating Formant Transitions • The steps remain the same once you create the formant filters. • Just make your source push through your filter - and you have your target sound! • … But sometimes resynthesis sounds funny

Problems (and some solutions) • There are a lot of reasons why resynthesized tokens can sound funny. • Even resynthesizing a base token with it’s own formants can sound funny • This is because Praat has to make a lot of choices in creating the source and filter - and it isn’t always perfect at these choices

Problems and some solutions • One problem that’s NOT Praat’s fault? • Source and filter come from different types of recordings • Intensity, vowel length, pitch and a number of other things are different across recordings, and can affect how successful a resynthesis is. • If your synthesized tokens sound funny - check to make sure you’re trying to synthesize something that’s about as long as the filter. • Also check to make sure the intensity isn’t really high or really low on either your source or filter. (You can change these things in Praat too)

Problems and some solutions • One problem that is sort of Praat’s fault? • Finding formants • Praat is good, but not perfect at this job. You can control a little bit how good it is by choosing the number of formants you want it to look for and how high you want Praat to look for them (higher for women than for men). • The more formants, the more natural the sound will be, but also the more likely Praat will mess up and find formants where they aren’t supposed to be.

Problems and some solutions • Sometimes even when you have chosen a really good number of formants in a good range, you still end up with clipping. • This is usually because Praat has found formants where they should not be. • Because Praat finds these fake formants, you get LOTS of energy at a particular frequency and you get clipping • This is sort of a smaller scale version of when we get clipping on recordings because someone is talking too loud, or the microphone is too close

Problems and some solutions • Praat can also be sort of a mess at higher formants, but you can tell Praat where to put higher formants via the formula command if you want. • To fix clipping caused by mistakes at lower levels: • You can visually inspect where Praat found formants, and manually manipulate them using the formant grid. • You can do everything right and still get clipping or crazy buzzes and hisses. • This is usually caused by a recording quality problem

Final Thoughts • Doing basic synthesis is easy • Getting it to sound like real speech (and getting rid of imperfections in the sound) is a little harder. • BUT - synthesis is a great way to get stimuli to be controlled in a number of variables. • (And a good way to understand acoustic phonetics in action!)

Synthesizing naturally produced tokens

Synthesizing naturally produced tokens

Presentation Transcript

Synthesizing sources

Synthesizing Information

Synthesizing

Synthesizing Literature

Synthesizing

Synthesizing Information

Synthesizing Information

Synthesizing Information

SYNTHESIZING SOURCES

Synthesizing Reconstruction

Synthesizing Sources

Synthesizing Sources

Synthesizing

Synthesizing Memory

Synthesizing Information

Synthesizing Resources

Synthesizing Ideas

Synthesizing Texts

Trustee Tokens

Mcap Tokens

Synthesizing

Synthesizing Memory