1 / 20

Pitch Estimation

This paper presents an algorithm for estimating pitch in speech signals using C++. It describes the frame/buffer approach and the implementation of a silent detector, employing correlation techniques to determine candidates for optimal pitch. The algorithm includes steps for assessing and improving pitch accuracy through bias correction and delay adjustments. Testing reveals potential enhancements for performance and real-time execution. By examining silent and non-silent frames, the approach aims to achieve reliable pitch estimation while addressing common computational challenges in speech recognition.

Télécharger la présentation

Pitch Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pitch Estimation Speech Recognition Raymond Sastraputera

  2. Outline • Introduction • Frame/Buffer • Algorithm • Silent Detector • Estimate Pitch • Correlation and Candidate • Optimal Candidate • Buffer Delay • Added Bias • Test and Result • Conclusion

  3. Introduction • Estimates the pitch on a speech • Written in C++

  4. Frame/Buffer • Frame segment are shifted with no overlap Frame segment Buffer

  5. Silent Detector • Initial detection of silent • |max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)| • Threshold Value (50dB) X Y Z

  6. Estimate Pitch (Correlation) • Correlation of two vectors

  7. Estimate Pitch (Correlation) • Correlation P(x,y) • Calculate for different window size (nm) • Window size will be the pitch value (in sample) • Correlation value above threshold become candidate with score 1 Vector x Vector y X Y Z nm nm

  8. Estimate Pitch (Correlation) • Correlation P(y,z) • Calculate for different nm • Only for window size in candidate score 1 • Correlation value above threshold become candidate with score 2 Vector y Vector z X Y Z nm nm

  9. Estimate Pitch (Correlation) • Correlation Q(n,m) • Calculate for different nm • nMAX is maximum nm in the candidate • Optimal Candidate • if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm Vector x Vector z X Y Z nMAX nm nMAX

  10. Estimate Pitch (Candidate) • Candidate score 1  Correlation P(x,y) • No candidate  silence • Single candidate  compute P(y,z) • Score stays at 1  hold • Score 2  estimated pitch • Multi candidate  compute P(y,z) • Candidate score 2  Correlation P(y,z) • No candidate  compute Q(n,m) candidate score1 • Single candidate  estimated pitch • Multi candidate  compute Q(n,m) • Optimal Pitch  Correlation Q(n,m)

  11. Estimate Pitch (Optimal Candidate) • Single candidate with score 2 • From Q(n,m) of • Candidate score 2 • Candidate score 1 • On hold, and next frame estimated pitch is neither silence nor on hold.

  12. Buffer Delay • Delay the returning value of estimated pitch • Needed to limit the duration of on hold

  13. Bias • Conditions: • Two previous frame is not silent • Previous frame is not on hold • Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch

  14. Bias • P(x,y) is doubled

  15. Test Parameter • correlation_threshold_silent(0.88) • Qnm_optimal_multiplier(0.77) • sample_rate(20000.0F) • max_pitch(400) • min_pitch(50) • pitch_buffer_size(20) • bias_max_frequency(7/4) • bias_min_frequency(5/8) • silent_threshold(50.0F)

  16. Test and Result

  17. Test and Result

  18. Test and Result

  19. Conclusion • Some improvement can be done to increase the performance of the estimated pitch. • Reduce the search space • Adding 1st order derivaiton of the pitch • Filtering the outlier / noise • Current algorithm might not be fast enough to perform in real time

  20. Reference • Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).

More Related