Sound Processing

Sound Processing CSC361/661 Digital Media Spring 2002

How Sound Is Produced • Air vibration • Molecules in air are disturbed, one bumping against another • An area of high pressure moves through the air in a wave • Thus a wave representing the changing air pressure can be used to represent sound

How Sound Perceived • The cochlea, an organ in our inner ears, detects sound. • The cochlea is joined to the eardrum by three tiny bones. • It consists of a spiral of tissue filled with liquid and thousands of tiny hairs. • The hairs get smaller as you move down into the cochlea. • Each hair is connected to a nerve which feeds into the auditory nerve bundle going to the brain. • The longer hairs resonate with lower frequency sounds, and the shorter hairs with higher frequencies. • Thus the cochlea serves to transform the air pressure signal experienced by the ear drum into frequency information which can be interpreted by the brain as sound.

Pulse Code Modulation • PCM is the most common type of digital audio recording. • A microphone converts a varying air pressure (sound waves) into a varying voltage. • Then an analog-to-digital converter samples the voltage at regular intervals. • Each sampled voltage gets converted into an integer of a fixed number of bits.

Digitization of Sound • Sampling • Most humans can’t hear anything over 20 kHz. • The sampling rate must be more than twice the highest frequency component of the sound (Nyquist Theorem). • CD quality is sampled at 44.1 kHz. • Frequencies over 22.01 kHz are filtered out before sampling is done. • Quantization • Telephone quality sound uses 8 bit samples. • CD quality sound uses 16 bit samples (65,536 quantization levels) on two channels for stereo.

Encoder Design A – B. Apply bandlimiting filter to remove high frequency components. C. Sample at regular time intervals. D. Quantize each sample.

Sampling Error (Undersampling) • If you undersample, one frequency will alias as another. • For CD quality, frequencies above 22.05 kHz are filtered out, and then the sound is sampled at 44.1 kHz. • This is depicted on the next slide. Figure from Multimedia Communications by Fred Halsall, Addison-Wesley, 2001.

Quantization Interval • If Vmaxis the maximum positive and negative signal amplitude and n is the number of binary bits used, then the magnitude of the quantization interval, q, is defined as follows: • For example, what if we have 8 bits and the values range from –1000 to +1000?

Quantization Error (Noise) • Any values within a quantization interval will be represented by the same binary value. • Each code word corresponds to a nominal amplitude value that is at the center of the corresponding quantization interval. • The actual signal may differ from the code word by up to plus or minus q/2, where q is the size of the quantization interval.

Quantization Intervals and Resulting Error

Results of Insufficient Quantization Levels • Insufficient quantization levels result from not using enough bits to represent each sample. • Insufficient quantization levels force you to represent more than one sound with the same value. This introduces quantization noise. • Dithering can improve the quality of a digital file with a small sample size (relatively few quantization levels).

Linear Vs. Non-Linear Quantization • In linear quantization, each code word represents a quantization interval of equal length. • In non-linear quantization, you use more digits to represent samples at some levels, and less for samples at other levels. • For sound, it is more important to have a finer-grained representation (i.e., more bits) for low amplitude signals than for high because low amplitude signals are more sensitive to noise. Thus, non-linear quantization is used.

Sound Editing • See Tutorial for • Choosing sampling rate and bit depth • Recording sound • See Studio Plugin Overview for information about multi-track recording • See Noise Reduction Overview for information about noise reduction

Fourier Analysis

Fourier Transform • It is possible to take any periodic function of time x(t) and resolve it into an equivalent infinite summation of sine waves and cosine waves with frequencies that start at 0 and increase in integer multiples of a base frequency = 1/T, where T is the period of x(t). • Mathematically, we can say the same thing with this equation: • This equation does NOT tell how to compute the Fourier transform, that is, how we get the coefficients a1…a and b1…b.

Discrete Fourier Transform • We can’t do an infinite summation on a computer. • For digitally sampled input we can do the summation using the same number of frequency samples as there are time input samples. • We can pretend that x(t) is periodic and that the period is the same length as the recording (or sound segment). • The base frequency will be 1/length of recording (or sound segment).

Difference Between Discrete Fourier Transform and Discrete Cosine Transform • The discrete cosine transform uses real numbers. This is all you need for image representation. • The Fourier Transform uses complex numbers, which have a real and an imaginary part.

Recall the definition of the Discrete Cosine Transform For an N X N pixel image the DCT is an array of coefficients where where This tells how to compute the Discrete Cosine Transform.

Versions of the Fourier Transform • Fourier Transform -- infinite summation • Discrete Fourier Transformation -- a sum of n waves derived from n samples; O(n2) complexity • Fast Fourier Transform -- a fast version of the Fourier transform, O(n* log2n) complexity; a disadvantage is that it requires a windowing function • See http://www.dataq.com/applicat/articles/an11.htm, http://www.dataq.com/applicat/articles/an11.htm, and http://www.chipcenter.com/eexpert/bmasta/bmasta001.html

Windowing Functions • Minimizes the effect of phase discontinuities at the borders of segments. • Hanning, Hamming, Blackman, and Blackman-Harris are often used.

Fourier Analysis in CoolEdit • Can be used to filter certain frequencies. • The window size and function are adjustable • Go to Transform/Filters/FFT to filter frequencies. • Go to Analyze/Frequency Analysis to see an analysis of the frequency.

Sound Processing