180 likes | 302 Vues
This paper presents a novel pitch-tracking algorithm that utilizes spectral and temporal processing to enhance fundamental frequency estimation. The method combines insights from previous algorithms, incorporating nonlinear processing techniques to restore missing fundamentals and improve robustness against noise. Experimental evaluations demonstrate the algorithm's superior performance in pitch tracking for both high-quality and telephone speech signals compared to existing methods. Key contributions include an improved autocorrelation function and effective candidate insertion strategies to mitigate pitch doubling/halving issues.
E N D
A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006
Outline • Introduction • Algorithm • Algorithm overview • The use of nonlinear processing • Pitch tracking from the spectrum • Experimental evaluation • Conclusion
Introduction • Pitch(the fundamental frequency) applications • Automatic speech recognition (ASR), speech synthesis, speech articulation training aids, etc. • Pitch detection algorithms • “Robust and accurate fundamental frequency estimation based on dominant harmonic components,” Nakatani, etc => High accuracy for noisy speech reported using the harmonic dominance spectrum • “Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc => Hybrid spectral-temporal processing for pitch tracking
1st harmonic 2nd harmonic Fundamental The fundamental reappears The Use of Nonlinear Processing • Restoration of missing fundamental in telephone speech • A periodic sound is characterized by the spectrum of its harmonics • The signal the fundamental missed be approximated as • After squaring and applying trigonometric identities
Illustration of Nonlinear Processing • The telephone speech signal (top panel) and squared telephone signal (bottom panel) for one frame
Illustration of Nonlinear Processing • The magnitude spectrum for the telephone (top panel) and nonlinear processed signal (bottom panel)
Spectral Effects from Nonlinear Processing • The missing fundamental in the telephone speech (top panel) is restored in the squared signal (bottom panel)
Pitch Tracking From the Spectrum • The pitch track from the spectrum refines the pitch candidates estimated from the temporal method • To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is proposed
k 4k 2k 3k WL X X X : Frequency index, : The spectrum, : The number of harmonics (3), : Window length (20Hz) Autocorrelation type of Function • The function takes into account multiple harmonics • Equation
A very prominent peak is observed in the proposed function Peaks in Autocorrelation Type of Function
P2(Hz)=P1(Hz)/2 Candidate Insertion to Reduce Pitch Doubling/Halving • If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate • Similar logic is used to reduce pitch halving
Experimental Evaluation • Database • Keele pitch extraction database • 5 male and 5 female speakers, about 35seconds speaker • High quality speech and telephone speech • Additive Gaussian noise • Controls (reference pitch) • Control C1: supplied in Keele database • Control C2: computed from the laryngograph signal with the proposed algorithm
Definition of Error Measures • Gross error • The percentage of frames such that the pitch estimate of the tracker deviates significantly (typically 20%) from the reference pitch (control) • Only evaluated in the voiced sections of the reference
Experiment 1 Results • Individual performance of the proposed algorithm YAAPT*: Using control C1 for the spectral pitch track NCCF : Normalized cross correlation function, used as the temporal method in YAPPT
The results of the new method with various error thresholds Experiment 2 Results
Comparisons • DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation ... ,” Nakatani, etc. • *: SRAEN filter simulated telephone speech
Conclusion • A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking • An analysis of errors indicates better performance for both high quality and telephone speech than previously reported performance for pitch tracking • Acknowledgements • This work was partially supported by JWFC 900