Pitch Estimation

Pitch Estimation Speech Recognition Raymond Sastraputera

Outline • Introduction • Frame/Buffer • Algorithm • Silent Detector • Estimate Pitch • Correlation and Candidate • Optimal Candidate • Buffer Delay • Added Bias • Test and Result • Conclusion

Introduction • Estimates the pitch on a speech • Written in C++

Frame/Buffer • Frame segment are shifted with no overlap Frame segment Buffer

Estimate Pitch (Correlation) • Correlation of two vectors

Estimate Pitch (Correlation) • Correlation P(x,y) • Calculate for different window size (nm) • Window size will be the pitch value (in sample) • Correlation value above threshold become candidate with score 1 Vector x Vector y X Y Z nm nm

Estimate Pitch (Correlation) • Correlation P(y,z) • Calculate for different nm • Only for window size in candidate score 1 • Correlation value above threshold become candidate with score 2 Vector y Vector z X Y Z nm nm

Estimate Pitch (Correlation) • Correlation Q(n,m) • Calculate for different nm • nMAX is maximum nm in the candidate • Optimal Candidate • if current candidate Qnm*0.77 is higher than preceeding candidate’s Qnm Vector x Vector z X Y Z nMAX nm nMAX

Estimate Pitch (Candidate) • Candidate score 1  Correlation P(x,y) • No candidate  silence • Single candidate  compute P(y,z) • Score stays at 1  hold • Score 2  estimated pitch • Multi candidate  compute P(y,z) • Candidate score 2  Correlation P(y,z) • No candidate  compute Q(n,m) candidate score1 • Single candidate  estimated pitch • Multi candidate  compute Q(n,m) • Optimal Pitch  Correlation Q(n,m)

Estimate Pitch (Optimal Candidate) • Single candidate with score 2 • From Q(n,m) of • Candidate score 2 • Candidate score 1 • On hold, and next frame estimated pitch is neither silence nor on hold.

Buffer Delay • Delay the returning value of estimated pitch • Needed to limit the duration of on hold

Bias • Conditions: • Two previous frame is not silent • Previous frame is not on hold • Previous frame pitch is between 5/8 and 7/4 of the preceding frame pitch

Bias • P(x,y) is doubled

Test Parameter • correlation_threshold_silent(0.88) • Qnm_optimal_multiplier(0.77) • sample_rate(20000.0F) • max_pitch(400) • min_pitch(50) • pitch_buffer_size(20) • bias_max_frequency(7/4) • bias_min_frequency(5/8) • silent_threshold(50.0F)

Test and Result

Conclusion • Some improvement can be done to increase the performance of the estimated pitch. • Reduce the search space • Adding 1st order derivaiton of the pitch • Filtering the outlier / noise • Current algorithm might not be fast enough to perform in real time

Reference • Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).

Pitch Estimation

Pitch Estimation

Presentation Transcript

Pitch

Pitch

Pitch

Pitch

Pitch

Pitch

Pitch

Pitch

Pitch

Pitch!

PITCH!

PITCH

Pitch

Pitch

Chapter 4: Pitch estimation for music signal processing

Pitch Estimation by Enhanced Super Resolution determinator

Pitch

Pitch

PITCH

Pitch

Pitch