200 likes | 334 Vues
This study investigates the perception of tonal differences in languages such as Mandarin and Thai, focusing on speech recognition systems like MFCC that typically ignore pitch information. By analyzing how native speakers perceive tones through whispered speech, we explore whether tonal perception relies on factors like vocal tract shape, duration, and strength instead of pitch. Our findings highlight significant accuracy levels in identifying tonal differences, suggesting that listeners can detect these variations even when pitch cues are absent.
E N D
Tonal Speech without Pitch Jerry Zhu zhuxj@cs.cmu.edu 2003/7/3
What’s in your mouth Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html
MFCC Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html * Focus on vocal tract shape (e.g. different vowels) * No pitch
Tonal languages • Tone: variation in pitch. e.g. Mandarin, Thai http://kca.org/education/ImageView.asp?ImageID=179
MFCC disastrous for tones? • MFCC should have no pitch info. • Bad for Mandarin speech recognition? Not really why?
Hypothesis 1 • Language context helps a lot? • e.g. singing over-rides pitch • people *do* understand the lyric (sort of)
Hypothesis 2 • MFCC retains some pitch? • by imperfection • residual pitch info used by speech recognizers • Test: convert MFCC to speech, listen for tones. (TBD)
Hypothesis 3 • Do we really need pitch to perceive tones? • Test: whispered speech • Can native speakers perceive tones in whispered speech? Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html
Minimum pairs • A minimum pair: two 2-char words with only 1 tonal difference. • Why not use • one-char words: to prevent over-articulating • multi-char words: hard to find min pairs.
Listener listens for the ORDERwithin each minimum pair Whisperer file Listener file
Experiment setup • Each whisperer/listener group work on about 100 different minimum pairs. • In a quiet room, 1 meter apart. Each pair whispered once. • Native speakers. (Liu J., Yu H., Zhang Y., Zhu X.)
What to expect • If there is no tonal info in whisper, listeners would guess the order with 50% accuracy.
Result significant? • Flip a coin 3 times, 2 heads 1 tail. A biased coin? • Chi-square test • Accuracy significantly better than random at p < 0.0001 (that’s *really* significant).
Accuracy breakdown . correct/total .
Accuracy breakdown . Accuracy %, significant at p<0.002 .
Summary • People do perceive tonal differences without pitch. • How? • Strength (power)? • Duration? • Subtle vocal tract shape difference?
While we are whispering... • Tonal difference (we’ve seen that) • Voiced / unvoiced consonant? time vs. dime • voice onset time http://www.indiana.edu/~hlw/PhonUnits/consonants2.html
Voiced/unvoiced consonant • [p,b], [t,d], [k,g] • Mandarin speakers 94% accuracy • Aspiration
Other languages? • Thai • Is tonal too; 5 tones. • Has [ph], [p], [b] would be interesting!