1 / 56

Pitch Tracking ( 音高追蹤 )

Pitch Tracking ( 音高追蹤 ). Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University jang@mirlab.org http://mirlab.org/jang. Pitch ( 音高 ). Definition of pitch Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform

dixon
Télécharger la présentation

Pitch Tracking ( 音高追蹤 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pitch Tracking (音高追蹤) Jyh-Shing Roger Jang (張智星) MIR Lab, Dept of CSIE National Taiwan University jang@mirlab.org http://mirlab.org/jang

  2. Pitch (音高) • Definition of pitch • Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform • Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) • Characteristics of pitch • Noise and unvoiced sounds do not have pitch.

  3. Pitch Tracking (音高追蹤) • Pitch tracking (PT): The process of computing the pitch vector of a give audio segment (對整段音訊求取音高) • Sample applications • Query by singing/humming (哼唱選歌) • Tone recognition for Mandarin (華語的音調辨識) • Intonation scoring for English (英語的音調評分) • Prosody analysis for speech synthesis (語音合成中的韻律分析) • Pitch scaling and duration modification (音高調節與長度改變)

  4. Typical Steps for Pitch Tracking • Pre-processing • Filtering • Excitation extraction • Main processing • Frame blocking • periodicity detection function computation • Pitch determination via max/min picking over the PDF • Post-processing • Unreliable pitch removal via volume/clarity thresholding • Pitch refinement via parabolic interpolation • Pitch smoothing via median filters, etc.

  5. Frame Blocking Overlap Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec = Pitch rate Zoom in Frame

  6. Periodicity Detection Functions • Periodicity detection functions (PDF) are used to detect the period of a waveform • Two categories of PDF • Time domain (時域) • ACF (Autocorrelation function) • NSDF (Normalized squared difference function) • AMDF (Average magnitude difference function) • Frequency domain (頻域) • Harmonic product spectrum • Cepstrum

  7. ACF: Auto-correlation Function 1 128 Original frame s(t): Shifted frame s(t-t): t=30 acf(30) = inner product of the overlap part Pitch period To play safe, the frame size needs to cover at least two fundamental periods!

  8. ACF: Formula 1 • Assume a frame is represented by s(t), t=0~n-1 • ACF formula s(t) s(t): t s(t-t): t s(t-t)

  9. ACF: Formula 2 • Assume a frame is represented by s(t), t=0~n-1 • ACF formula s(t) s(t): t s(t+t): t s(t+t)

  10. Example of ACF • sunday.wav • Sample rate = 16kHz • Frame size = 512 (starting from point 9000) • Fundamental frequency • Max of ACF occurs at index 131 • FF = 16000/131 = 123.077 Hz • frame2acf01.m Index 0 Index 131 We suppose it is zero-based indexing.

  11. Locating the Pitch Point • If the range of humans’ FF is [40, 1000], then we have the interval for locating the index for pitch point (PP): • frame2acfPitchPoint01.m Index 0 Index pp

  12. Locating the Pitch Point (II) • What could go wrong? • Vitas • http://www.youtube.com/watch?v=YjO_VXHxsRw&hd=1 (local short clip) • Whistling • Low-pitch singing/humming  requires a big frame size

  13. Example of ACF Based PT • Specs • Sample rate = 11025 Hz • Frame size = 353 points = 32 ms • Overlap = 0 • Frame rate = 31.25 f/s • Playback • Original singing • Pitch by ACF • wave2pitchByAcf01.m

  14. Example of ACF Based PT (II) • Specs • The previous script is converted into a function pitchTrackingSimple.m for easy access. • ptByAcf01.m

  15. Demo of ACF-based PT • Real-time display of ACF for pitch tracking • goPtByAcf.mdl under SAP toolbox • Real-time pitch tracking for mic input • goPtByAcf2.mdl under SAP toolbox

  16. ACF Variants to Avoid Tapering • Normalized version • frame2acf02.m • Half-frame shifting • frame2acf03.m

  17. NSDF: ACF Variant with Normalize Range • NSDF: normalized squared difference function • Formula: • A variant of ACF within the range [-1 1], based on the inequality:

  18. NSDF Example • frame2nsdf01.m Clarity: height of the pitch point

  19. AMDF: Average Magnitude Difference Function 1 128 Original frame s(i): Shifted frame s(i-t): t=30 amdf(30) = sum of abs. difference of the overlap part Pitch period 30

  20. Example of AMDF • sunday.wav • Sample rate = 16kHz • Frame size = 512 (starting from point 9000) • Fundamental frequency • Pitch point occurs at index 131, which is harder to determine • frame2amdf01.m Index 0 Index 131

  21. Example of AMDF to Pitch • sunday.wav • Sample rate = 16kHz • Frame size = 512 (starting from point 9000) • Fundamental frequency • Pitch point occurs at index 131, which is determined correctly • FF = 16000/131 = 123.077 Hz • frame2amdf4pt01.m Index 0 Index 131

  22. Example of AMDF Based PT • Specs • Sample rate = 11025 Hz • Frame size = 353 points = 32 ms • Overlap = 0 • Frame rate = 31.25 f/s • Playback • Original singing • Pitch by AMDF • ptByAmdf01.m

  23. AMDF: Variations to Avoid Tapering • Normalized version • frame2amdf02.m • Half-frame shifting • frame2amdf03.m

  24. Combining ACF and AMDF Frame ACF AMDF ACF/AMDF

  25. Audio Features in Time Domain • Audio features presented in the time domain Fundamental period Intensity Timbre: Waveform within an FP

  26. Audio Features in Frequency Domain • Energy: Sum of power spectrum • Pitch: Distance between harmonics • Timber: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Energy

  27. About DFT & FFT • Terminology • DFT: Discrete Fourier transform • FFT: Fast Fourier transform, which is an efficient method for computing DFT • More about DFT

  28. Harmonic Product Spectrum (HPS) • Procedure • Compute the power spectrum of a frame • Eliminate its trend obtained from 20-order polynomial fitting  Formants are removed • Apply exponential weighting to suppress high-frequency harmonics • Down sample and add to enhance the harmonics at the fundamental frequency • Find the max as the pitch point

  29. “Down Sample and Add” in HPS

  30. Example of HPS • frame2hps01.m

  31. Example of PT by HPS • ptByHps01.m

  32. PT by Cepstrum • Formula for cepstrum • Procedure for PT by cepstrum • Compute the power spectrum of a frame. • Eliminate the trend of the power spectrum if necessary. • Take the inverse FFT on the (symmetric) power spectrum. (The result is real, why?) • Find position of the max to compute the pitch.

  33. PT by Cepstrum: How It Works? Close to sinusoids! This should be a single pulse only!

  34. Example of Cepstrum • frame2ceps01.m

  35. Example of PT by Cepstrum • ptByCeps01.m

  36. Preprocessing for Pitch Tracking • Some commonly used preprocessing for the audio signals before pitch tracking • Pre-filtering the signals • Clipping the signals • SIFT method for the signals

  37. Preprocessing: Pre-filtering • Observation • Range of humans’ pitch: [40, 1000] • Idea • Low-pass the signals with a cutoff frequency between 800 and 1000 • Characteristics • The effect is yet to be verified

  38. Preprocessing: Clipping • Observation • Small signals near zero is likely to cause pitch tracking error • Idea • Clip the signals • Characteristics • Save computation for embedded system • Overall effect is yet to be verified

  39. Preprocessing: SIFT • Observation • Channel effect is likely to cause pitch tracking error • Idea of SIFT (simple inverse filter tracking) • Identify the excitation via LPC • Use the excitation for PDF • Characteristics • Overall effect is yet to be verified

  40. Example of SIFT • siftAcf01.m

  41. Example of PT based on SIFT & ACF • ptBySiftAcf01.m

  42. Postprocessing for Pitch Tracking • Some commonly used postprocessing for pitch tracking • Smoothing to remove abrupt-changing pitch • Interpolation to increase pitch precision

  43. Postprocessing: Smoothing • Smoothing by a median filter • ptWithMedianFilter01.m

  44. Postprocessing: Interpolation • Idea • Using the pitch point and its neighbors to identify the max position • ptWithParabolicFit01.m

  45. UPDUDP (1/4) • UPDUDP: Unbroken Pitch Determination Using DP • Goal: To take pitch smoothness into consideration • : a given path in the AMDF matrix • : Number of frames • : Transition penalty • : Exponent of the transition difference Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, No. 10, Vol. 7, Aug 2008.

  46. UPDUDP (2/4) • Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j) • Recurrent formula: • Initial conditions : • Optimum cost :

  47. Example of UPDUDP • A typical example (via AMDF)

  48. Robustness of UPDUDP • Insensitivity in

  49. Another Example of UPDUDP • Example of MATLAB code using UPDUDP (via ACF) • Result waveFile='arina_short.wav'; wObj=waveFile2obj(waveFile); ptOpt=ptOptSet(wObj.fs, wObj.nbits, 1); pitch=pitchTracking(wObj, ptOpt, 1);

  50. Frequency to Semitone Conversion • Semitone : A music scale based on A440 • Reasonable pitch range: • E2 - C6 • 82 Hz - 1047 Hz ( - )

More Related