180 likes | 310 Vues
This document explores fundamental audio features essential for the processing and recognition of audio signals. It covers common features such as volume, pitch, zero-crossing rate, and timbre, each analyzed both qualitatively and quantitatively. The content explains audio features in the time and frequency domains, outlining methods for feature extraction, including frame blocking and endpoint detection. Practical examples, including sinusoidal signals and computations of intensity and pitch, are also provided to aid understanding and application in real-world audio analysis.
E N D
Basic Features of Audio Signals(音訊的基本特徵) Jyh-Shing Roger Jang (張智星) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan
Audio Features • Four commonly used audio features • Volume • Pitch • Zero crossing rate • Timber • Our goal • These features can be perceived subjectively. • But we need to compute them quantitatively for further processing and recognition.
Audio Features in Time Domain • Audio features presented in the time domain Fundamental period Intensity Timbre: Waveform within an FP
Audio Features in Frequency Domain • Volume: Magnitude of spectrum • Pitch: Distance between harmonics • Timber: Smoothed spectrum Second formant F2 Pitch freq First formant F1 Intensity
Demo: Real-time Spectrogram • Try “dspstfft_audio” under MATLAB: Spectrum: Spectrogram:
Steps for Audio Feature Extraction • Frame blocking • Frame duration of 20 ms or so • Feature extraction • Volume, zero-crossing rate, pitch, MFCC, etc • Endpoint detection • Usually based on volume & zero-crossing rate
Frame Blocking Overlap Sample rate = 11025 Hz Frame size = 256 samples Overlap = 84 samples (Hop size = 256-84) Frame rate = 11025/(256-84)=64 frames/sec Zoom in Frame
Intensity (I) • Intensity • Visual cue: Amplitude of vibration • Computation: • Volume: • Log energy (in decibel): • Characteristics • Influenced by • microphone types • Microphone setups • Perceived volume is influenced by frequency and timbre
Intensity (II) • To avoid DC drifting • DC drifting: The vibration is not around zero • Computation: • Volume: • Log energy (in decibel): • Theoretical background (How to prove?)
Intensity (III) • Examples • Please refer to the online tutorial
Pitch • Definition • Pitch is known as fundamental frequency, which is equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz). • More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:
Pitch Computation (I) • Pitch of tuning forks
Pitch Computation (II) • Pitch of speech
Statistics of Mandarin Chinese • 5401 characters, each character is at least associated with a base syllable and a tone • 411 base syllables, and most syllables have 4 ones, so we have 1501 tonal syllables • Tone is characterized by the pitch curves: • Tone 1: high-high • Tone 2: low-high • Tone 3: high-low-high • Tone 4: high-low • Some examples of tones: • 1242:清華大學 • 1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 • ?????:美麗大教堂、滷蛋有夠鹹(Taiwanese)
Sinusoidal Signals • How to generate a stream of sinusoidal signals fs=16000; duration=3; f=440; t=(1:fs*duration)/fs; y=0.8*sin(2*pi*f*t); plot(t,y); axis([0.6, 0.65, -1 1]); sound(y, fs);
Zero Crossing Rate • Zero crossing rate (ZCR) • The number of zero crossing in a frame. • Characteristics: • Noise and unvoiced sound have high ZCR. • ZCR is commonly used in endpoint detection, especially in detection the start and end of unvoiced sounds. • To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.
ZCR Computations • Two types of ZCR definition • If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower. • It affects the ZCR, especially when the sample rate is low. • Other consideration • Zero-justification is required. • ZCR with shift can be used to distinguish between unvoiced sounds and silence. (How to determine the shift amount?)
ZCR • Examples • Please refer to the online tutorial.