1 / 38

Sonorant Grab Bag

Sonorant Grab Bag. March 27, 2014. Speech Synthesis: A Basic Overview. Speech synthesis is the generation of speech by machine. The reasons for studying synthetic speech have evolved over the years: Novelty To control acoustic cues in perceptual studies

tawana
Télécharger la présentation

Sonorant Grab Bag

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sonorant Grab Bag March 27, 2014

  2. Speech Synthesis:A Basic Overview • Speech synthesis is the generation of speech by machine. • The reasons for studying synthetic speech have evolved over the years: • Novelty • To control acoustic cues in perceptual studies • To understand the human articulatory system • “Analysis by Synthesis” • Practical applications • Reading machines for the blind, navigation systems

  3. Speech Synthesis:A Basic Overview • There are four basic types of synthetic speech: • Mechanical synthesis • Formant synthesis • Based on Source/Filter theory • Concatenative synthesis • = stringing bits and pieces of natural speech together • Articulatory synthesis • = generating speech from a model of the vocal tract.

  4. 1. Mechanical Synthesis • The very first attempts to produce synthetic speech were made without electricity. • = mechanical synthesis • In the late 1700s, models were produced which used: • reeds as a voicing source • differently shaped tubes for different vowels

  5. Mechanical Synthesis, part II • Later, Wolfgang von Kempelen and Charles Wheatstone created a more sophisticated mechanical speech device… • with independently manipulable source and filter mechanisms.

  6. Mechanical Synthesis, part III • An interesting historical footnote: • Alexander Graham Bell and his “questionable” experiments with his dog. • Mechanical synthesis has largely gone out of style ever since. • …but check out Mike Brady’s talking robot.

  7. The Voder • The next big step in speech synthesis was to generate speech electronically. • This was most famously demonstrated at the New York World’s Fair in 1939 with the Voder. • The Voder was a manually controlled speech synthesizer. • (operated by highly trained young women)

  8. Voder Principles • The Voder basically operated like a vocoder. • Voicing and fricative source sounds were filtered by 10 different resonators… • each controlled by an individual finger! • Only about 1 in 10 had the ability to learn how to play the Voder.

  9. Overtone Singing • F0 stays the same (on a “drone”), while singer shapes the vocal tract so that individual harmonics (“overtones”) resonate. • What kind of voice quality would be conducive to this?

  10. Vowels and Sonorants • So far, we’ve talked a lot about the acoustics of vowels: • Source: periodic openings and closings of the vocal folds. • Filter: characteristic resonant frequencies of the vocal tract (above the glottis) • Today, we’ll talk about the acoustics of sonorants: • Nasals • Laterals • Approximants • The source/filter characteristics of sonorants are similar to vowels… with a few interesting complications.

  11. Damping • One interesting acoustic property exhibited by (some) sonorants is damping. • Recall that resonance occurs when: • a sound wave travels through an object • that sound wave is reflected... • ...and reinforced, on a periodic basis • The periodic reinforcement sets up alternating patterns of high and low air pressure • = a standing wave

  12. Resonance in a closed tube t i m e

  13. Damping, schematized • In a closed tube: • With only one pressure pulse from the loudspeaker, the wave will eventually dampen and die out. • Why? • The walls of the tube absorb some of the acoustic energy, with each reflection of the standing wave.

  14. Damping Comparison • A heavily damped wave wil die out more quickly... • Than a lightly damped wave:

  15. Damping Factors • The amount of damping in a tube is a function of: • The volume of the tube • The surface area of the tube • The material of which the tube is made • More volume, more surface area = more damping • Think about the resonant characteristics of: • a Home Depot • a post-modern restaurant • a movie theater • an anechoic chamber

  16. An Anechoic Chamber

  17. Resonance and Recording • Remember: any room will reverberate at its characteristic resonant frequencies • Hence: high quality sound recordings need to be made in specially designed rooms which damp any reverberation • Examples: • Classroom recording (29 dB signal-to-noise ratio) • “Soundproof” booth (44 dB SNR) • Anechoic chamber (90 dB SNR)

  18. Spectrograms classroom “soundproof” booth

  19. Spectrograms anechoic chamber

  20. Inside Your Nose • In nasals, air flows through the nasal cavities. • The resonating “filter” of nasal sounds therefore has: • increased volume • increased surface area •  increased damping • Note: • the exact size and shape of the nasal cavities varies wildly from speaker to speaker.

  21. Nasal Variability • Measurements based on MRI data (Dang et al., 1994)

  22. Damping Effects, part 1 • Damping by the nasal cavities decreases the overall amplitude of the sound coming out through the nose. [m] [m]

  23. Damping Effects, part 2 • How might the power spectrum of an undamped wave: • Compare to that of a damped wave? • A: Undamped waves have only one component; • Damped waves have a broader range of components.

  24. Here’s Why 100 Hz sinewave + 90 Hz sinewave + 110 Hz sinewave

  25. The Result 90 Hz + 100 Hz + 110 Hz • If the 90 Hz and 110 Hz components have less amplitude than the 100 Hz wave, there will be less damping:

  26. Damping Spectra light medium

  27. Damping Spectra heavy • Damping increases the bandwidth of the resonating filter. • Bandwidth = the range of frequencies over which a filter will respond at .707 of its maximum output. •  Nasal formants will have a larger bandwidth than vowel formants.

  28. Bandwidth in Spectrograms F3 of F3 of [m] The formants in nasals have increased bandwidth, in comparison to the formants in vowels.

  29. Nasal Formants • The values of formant frequencies for nasal stops can be calculated according to the same formula that we used for to calculate formant frequencies for an open tube. • fn = (2n - 1) * c • 4L • The simplest case: uvular nasal . • The length of the tube is a combination of: • distance from glottis to uvula (9 cm) • distance from uvula to nares (12.5 cm) • An average tube length (for adult males): 21.5 cm

  30. The Math 12.5 cm • fn = (2n - 1) * c • 4L • L = 21.5 cm • c = 35000 cm/sec • F1 = 35000 • 86 • = 407 Hz • F2 = 1221 Hz • F3 = 2035 Hz 9 cm

  31. The Real Thing • Check out Peter’s production of an uvular nasal in Praat. • And also Dustin’s neutral vowel! • Note: the higher formants are low in amplitude • Some reasons why: • Overall damping • “Nostril-rounding” reduces intensity • Resonance is lost in the side passages of the sinuses. • Nasal stops with fronter places of articulation also have anti-formants.

  32. Anti-Formants • For nasal stops, the occlusion in the mouth creates a side cavity. • This side cavity resonates at particular frequencies. • These resonances absorb acoustic energy in the system. • They form anti-formants

  33. Anti-Formant Math • Anti-formant resonances are based on the length of the vocal tract tube. • For [m], this length is about 8 cm. 8 cm • fn = (2n - 1) * c • 4L L = 8 cm AF1 = 35000 / 4*8 = 1094 Hz AF2 = 3281 Hz etc.

  34. Spectral Signatures • In a spectrogram, acoustic energy lowers--or drops out completely--at the anti-formant frequencies. anti-formants

  35. Nasal Place Cues • At more posterior places of articulation, the “anti-resonating” tube is shorter. •  anti-formant frequencies will be higher. • for [n], L = 5.5 cm • AF1 = 1600 Hz • AF2 = 4800 Hz • for , L = 3.3 cm • AF1 = 2650 Hz • for , L = 2.3 cm • AF1 = 3700 Hz

  36. [m] vs. [n] [m] [e] [n] [o] AF1 (n) AF1 (m) • Production of [meno], by a speaker of Tsonga • Tsonga is spoken in South Africa and Mozambique

  37. Nasal Stop Acoustics: Summary • Here’s the general pattern of what to look for in a spectrogram for nasals: • Periodic voicing. • Overall amplitude lower than in vowels. • Formants (resonance). • Formants have broad bandwidths. • Low frequency first formant. • Less space between formants. • Higher formants have low amplitude.

  38. Perceiving Nasal Place • Nasal “murmurs” do not provide particularly strong cues to place of articulation. • Can you identify the following as [m], [n] or ? • Repp (1986) found that listeners can only distinguish between [n] and [m] 72% of the time. • Transitions provide important place cues for nasals. • Repp (1986): 95% of nasals identified correctly when presented with the first 10 msec of the following vowel. • Can you identify these nasal + transition combos?

More Related