Understanding Computer Sounds: Basics, Processing, and Quality

Lab #8Follow-Up:Sounds and Signals* * Figures from Kaplan, D. (2003) Introduction to Scientific Computation and Programming CLI Engineering.

Intro to Sounds & Signals • Recall transducerconcept convert input signal into numbers. • Signal: a quantity that changes over time • Body temperature • Air pressure (sound) • Electrical potential on skin (electrocardiogram) • Seismological disturbances • Stock prices

Intro to Sounds & Signals • We will study audio signals (sounds), but the same issues apply across a broad range of signal types. • Two different approaches to doing the same thing: • Commercial GUI program (Audacity, Pro Tools) • Programmatic (Python)

13.1 Basics of Computer Sound >>> x, fs, bits = wavread("fh.wav") >>> len(x) 41777 >>> min(x), max(x) ('\x00', '\xff') >>> fs 11025 >>> bits 8

Basics of Computer Sound

Basics of Computer Sound • x contains the sound waveform (signal) – essentially, voltage levels representing transduced air pressure on microphone. • fsis the sampling frequency (rate) – how many time per second (Hertz, Hz), did we measure the voltage? • bits is the number of bits used to represent each sample.

Questions • Why does the sound waveform range from hexadecimal 00 to FF, whereas we plot it as -1 to +1? • These values are essentially arbitrary. One nice feature of a ±x representation is that zero means silence. But the audio player likes values between 0 and 255. • What role does the sampling frequency play in the quality of the sound? • The more samples per second, the closer the sound is to a “perfect” recording.

Questions • What happens if we double (or halve) the sampling frequency at playback, and why? • What is it about the waveform that determines the sound we're hearing (which vowel), and the speaker's voice?

Questions • What is it about the waveform that determines the sound we're hearing (which vowel), and the speaker's voice? • Most of this information is encoded in the frequencies that make up the waveform – roughly, the differences between locations of successive peaks – and not in the actual waveform values themselves. • We can do some useful processing on the “raw” waveform, however – e.g., count syllables:

Syllable Counting by Smoothing and Peak-Picking

Perception and Generation of Sound • Sound is the perception of small, rapid vibrations in air pressure on the ear. • Simplest model of sound is a function P(t) expressing pressure P at time t: P(t) = A sin(2πft + φ) where A = amplitude (roughly, loudness) f = frequency (cycles per second) φ = phase (roughly, starting point) • This is the equation for a pure musical tone (just one pitch)

Perception and Generation of Sound • Inverse of frequency is period(distance between peaks):

Perception and Generation of Sound • E.g., whistling a musical scale:

Transducing and Recording Sound • Convert sound pressure to voltage, then digitize voltage into N discrete values in interval [xmin, xmax], by sampling at frequency Fs. • This is done by a analog /digital converter. • Another device must pre-amplify sound to match input expectations of a/dconverter. • N is typically a power of 2, so we can use bits to express sampling precision (minimum 8 for decent quality). This is called quantization. • Various things can go wrong if we don't choose these values wisely....

Transducing and Recording Sound A segment of the sound “OH” transduced to voltage. Top: The preamplifier has been set appropriately so that the analog voltage signal takes up a large fraction of the A/D voltage range. The digitized signal closely resembles the analog signal even though the A/D conversion is set to 8 bits. Bottom: The preamplifier has been set too low. Consequently, there is effectively only about 3 bits of resolution in the digitized signal; most of the range is unused.

Transducing and Recording Sound Figure 13.6. Clipping of a signal (right) when the preamplifier has been set too high, so that the signal is outside of the −5 to 5 V range of the A/D converter.

Aliasing and the Sampling Frequency • Someone has an alias when they use more than one name (representation) • In the world of signals, this means having more than one representation of an analog signal, because of inadequate sampling frequency • Familiar visual aliasing from the movies (when 32 frames per second is too slow) • Wagon wheel / propeller going backwards • Scan lines appearing on computer screen • InadequateFscan result in aliasing for sounds too....

Aliasing and the Sampling Frequency

Aliasing and the Sampling Frequency Aliasing. A set of samples marked as circles. The three sine waves plotted are of different frequencies, but all pass through the same samples. The aliased frequencies are F +m/∆T, where m is any integer and ∆T is the sampling interval. The sine waves shown are m = 0, m = 1, and m = 2.

Aliasing and the Sampling Frequency • Nyquist's Theorem tells us that Fsshould be at least twice the maximum frequency Fmaxwe wish to reproduce. • Intuitively, we need two values to represent a single cycle: one for peak, one for valley:

Aliasing in the Time Domain

Understanding Computer Sounds: Basics, Processing, and Quality

Understanding Computer Sounds: Basics, Processing, and Quality

Presentation Transcript

Lab Oven & Lab Incubator

Lab Accuracy/Precision Lab

Lab

Lab #2 Computer Lab

Lab #2 Maze Lab

Lab

Lab

Lab

Lab 9—Last Lab

Lab: Graphing Lab