1 / 20

Pitch Estimation

Pitch Estimation. Speech Recognition Raymond Sastraputera. Outline. Introduction Frame/Buffer Algorithm Silent Detector Estimate Pitch Correlation and Candidate Optimal Candidate Buffer Delay Added Bias Test and Result Conclusion. Introduction. Estimates the pitch on a speech

nell-downs
Télécharger la présentation

Pitch Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pitch Estimation Speech Recognition Raymond Sastraputera

  2. Outline • Introduction • Frame/Buffer • Algorithm • Silent Detector • Estimate Pitch • Correlation and Candidate • Optimal Candidate • Buffer Delay • Added Bias • Test and Result • Conclusion

  3. Introduction • Estimates the pitch on a speech • Written in C++

  4. Frame/Buffer • Frame segment are shifted with no overlap Frame segment Buffer

  5. Silent Detector • Initial detection of silent • |max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)| • Threshold Value (50dB) X Y Z

  6. Estimate Pitch (Correlation) • Correlation of two vectors

  7. Estimate Pitch (Correlation) • Correlation P(x,y) • Calculate for different window size (nm) • Window size will be the pitch value (in sample) • Correlation value above threshold become candidate with score 1 Vector x Vector y X Y Z nm nm

  8. Estimate Pitch (Correlation) • Correlation P(y,z) • Calculate for different nm • Only for window size in candidate score 1 • Correlation value above threshold become candidate with score 2 Vector y Vector z X Y Z nm nm

  9. Estimate Pitch (Correlation) • Correlation Q(n,m) • Calculate for different nm • nMAX is maximum nm in the candidate • Optimal Candidate • if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm Vector x Vector z X Y Z nMAX nm nMAX

  10. Estimate Pitch (Candidate) • Candidate score 1  Correlation P(x,y) • No candidate  silence • Single candidate  compute P(y,z) • Score stays at 1  hold • Score 2  estimated pitch • Multi candidate  compute P(y,z) • Candidate score 2  Correlation P(y,z) • No candidate  compute Q(n,m) candidate score1 • Single candidate  estimated pitch • Multi candidate  compute Q(n,m) • Optimal Pitch  Correlation Q(n,m)

  11. Estimate Pitch (Optimal Candidate) • Single candidate with score 2 • From Q(n,m) of • Candidate score 2 • Candidate score 1 • On hold, and next frame estimated pitch is neither silence nor on hold.

  12. Buffer Delay • Delay the returning value of estimated pitch • Needed to limit the duration of on hold

  13. Bias • Conditions: • Two previous frame is not silent • Previous frame is not on hold • Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch

  14. Bias • P(x,y) is doubled

  15. Test Parameter • correlation_threshold_silent(0.88) • Qnm_optimal_multiplier(0.77) • sample_rate(20000.0F) • max_pitch(400) • min_pitch(50) • pitch_buffer_size(20) • bias_max_frequency(7/4) • bias_min_frequency(5/8) • silent_threshold(50.0F)

  16. Test and Result

  17. Test and Result

  18. Test and Result

  19. Conclusion • Some improvement can be done to increase the performance of the estimated pitch. • Reduce the search space • Adding 1st order derivaiton of the pitch • Filtering the outlier / noise • Current algorithm might not be fast enough to perform in real time

  20. Reference • Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).

More Related