1 / 24

MPEG-4

MPEG-4. John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003 Further modified by Ichiro Fujinaga January 20, 2005. CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw. MPEG 4 Standard.

ceri
Télécharger la présentation

MPEG-4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPEG-4 John Lazzaro John Wawrzynek June 18, 2001 Modified by Francois Thibault January 20, 2003 Further modified by Ichiro Fujinaga January 20, 2005 CS Division University of California at Berkeley www.cs.berkeley.edu/~johnw

  2. MPEG 4 Standard • Finalized its standardization process in 1999 (Vancouver) • Design to integrate visual and audio • Includes "natural" (recorded) and "synthetic" (synthesized) coding of audio and video

  3. MPEG 4 Scope • Provides a set of technologies to satisfy the needs of • authors • network service providers • end users • Enables the production of content that has far greater reusability in • digital television • animated graphics • web pages

  4. MPEG 4 Features MPEG-4 provide standardized ways to: • represent units of aural, visual or audiovisual content, called “media objects” • Natural origin • Synthetic origin • recorded with a camera or microphone, or generated with a computer • describe the composition of these objects to create compound media objects that form audiovisual scenes • multiplex and synchronize the data associated with media objects, so that they can be transported over networks providing a QoS (Quality of Service) • interact with the audiovisual scene generated at the receiver’s end

  5. MPEG 4 Standard (audio) MPEG 4 video audio system Natural coding Synthetic coding SA TTS AAC T/F CELP Parametric ISO/IEC 14496-3 sec5

  6. MPEG 4 Audio: Natural (recorded) • AAC: The Advanced Audio Coding • Originally created as an extension to MPEG-2 • Provides better quality at 64 kbit/sec/channel than MP3 does at 128 kbit/sec/channel • CELP: A codebook-excited linear prediction • scheme optimized for telephone- quality transmission of speech in the range 8-32 kbps • Parametric: • A novel "harmonic vector + noise" method that allows lossy but extremely low-bitrate coding of wideband sounds down to 2 kbps/sec/ channel

  7. MPEG 4 Audio: Synthetic (synthesized) • Structured Audio: • A downloadable synthesis method that allows producers to describe new synthesis methods as part of the bitstream • the receiver implements a reconfigurable synthesis engine and synthesizes the sound on-the-fly as the instructions are received • Text-to-Speech: • An interface to standalone TTS systems is provided, so that synthetic speech can be synchronized in multimedia presentations • No "method" of creating synthetic speech is standardized by MPEG

  8. MPEG 4 Standard - Structured Audio MPEG 4 video audio system Natural coding Synthetic coding SA TTS AAC T/F CELP Parametric Structured Audio: One “component” in the MPEG audio standard. ISO/IEC 14496-3 sec5

  9. Audio Compression Basics decoder • Traditional Technique for Music amp Filter into Critical Bands Allocate Bits Format Bit-stream time Compute Masking encoder

  10. The Kolmogorov alternative: • Write acomputer program that generates the desired audio stream. • Transmit the computer program. • To decode, execute the program. Similar to Postscript! • MPEG-4 Structured Audio (MP4-SA) uses this approach. • Eric Scheirer, Editor (MIT Media Lab). • http://sound.media.mit.edu/~eds/mpeg4/

  11. MP4-SA Encoding MP4-SA Decoders • are interpreters or compilers. • may be a creative act: writing a program. • directly (emacs), or • indirectly (GUI, webpage) • In this case, MP4-SA is a lossless compressor. • may be automatic: given a sound, an encoder writes a program that generates the sound. • Automatic encoding is a hard in the general case.

  12. Key Application: Music Production Network MP4-SA Maps to Modern Music Production Premium on low-bandwidth • “The Program” • synthesis algorithms • effects “boxes” • mixers • “The Decoder” • sound rendering Musical performance Mix-down control information • Modern music production is computer-based. • Musicians enter performances into computers as control information, not audio waveforms. • Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control.

  13. Key Application: Music Production MP4-SA Maps to Modern Music Production • “The Program” • synthesis algorithms • effects “boxes” • mixers Standard Framework • “The Decoder” • sound rendering Musical performance Mix-down control information File System • Modern music production is computer-based. • Musicians enter performances into computers as control information, not audio waveforms. • Digital synthesizers, effects, and mixes create the final audio, under engineer/producer control. Ideal for collaborative productions, remixes, and ...

  14. Key Application: Music Performance MP4-SA Enables Networked Music Performance • “The Decoder” • sound rendering • “The Decoder” • sound rendering Network + + Premium on low-bandwidth • Music Performance requires dynamic control. • True interactively requires parameterized sounds. • Musicians control instruments and effects with interactive controllers. • Control could be indirect and remote (ex: games).

  15. MPEG 4 Structured Audio: • A binary file format that encodes: • The programming language SAOL (pronounced: sail). • The musical score language SASL. • Legacy support for MIDI. • Audio sample data. • Result is normative: an MP4-SA file will sound identical on all compliant decoders. • Different from MIDI files.

  16. Why SAOL and MP4-SA?Why not Java? Amplitude & timbre envelopes: 10’s of msec Sample-by-sample 10’s of usec Note-by-note: 100’s of msec • Musical performance have temporal structure that changes over several timescales: • Writing sound generation code in a conventional language results in code dominated by time-scale management. • Hard to maintain, hard to optimize.

  17. Time management is built into SAOL. • A SAOL program executes by moving a simulated clock forward in time, performing calculations along the way in a synchronous fashion. • Work is scheduled to happen: • at the a-rate (the audio sample rate) • at the k-rate (envelope control rate) • at the i-rate (rate for new notes) • Language variables are typed as a/k/i-rate. • A language statement is scheduled based on the rate of the variables it contains.

  18. SAOL, SASL, and Scheduling: • Sound creation in MP4-SA can be compared to a musician playing notes on an instrument. • A SAOL subprogram (called an instr or instrument) serves as the instrument. • SASL commands (called score lines) act to play notes on SAOL instruments. • Many instances of a SAOL instr can be active at one time, making sounds corresponding to notes launched by different score lines in a SASL file.

  19. An example: • This SASL file plays melody on tone: 0.5 tone 0.75 52 0.25 1.5 tone 0.75 64 0.25 2.5 tone 0.5 63 0.25 3 tone 0.25 59 0.2 3.25 tone 0.25 61 0.225 3.5 tone 0.5 63 0.225 4 tone 0.5 64 0.25 5 end When instance is launched Instance parameters (note number, loudness) How long instrument runs • SAOL instrument tone, that plays a gated sine wave. (SAOL code in next slide.)

  20. SAOL code for tone instr tone (note, loudness) { ivar a; // sets osc f ksig env; // env output asig x, y; // osc state asig init; a = 2*sin(3.141597*cpsmidi(note)/s_rate); env = kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0); if (init == 0) // first a-pass only { x = loudness; init = 1; } x = x - a*y; // the FLOPS happen in y = y + a*x; // these 3 statements output(y*env); // creates audio output } // end of instr tone

  21. SAOL Features • Rate semantics: • i/k/a-rate execution • Vector arithmetic: • ex: A=B+Cfor i=1,n A[i]=B[i]+C[i] • All floating-point arithmetic. • Extensive build-in audio function library: • signal generators, table operators, pitch converters, filters, fft, sample rate conversion, effects, ...

  22. Sfront - a SAOL-to-C translator sfront foo.mp4 sa.c • Handles SAOL, SASL, MIDI, uncompressed samples. SAOL SASL foo.mp4 sfront MIDI sa.c Uncompressed samples • Converts MP4-SA files to a ANSI C program, that when executed, produces audio. • Runs on UNIX, Windows, MacOS. • Under Linux, supports real-time MIDI input, real-time audio input and output, and MIDI over RTP (Real Time Protocol). • www.cs.berkeley.edu/~lazzaro/sa

  23. Generator Techniques • Much of the SA standard describes a library • 104 core opcodes (ex: pow(), allpass(), reverb() ) • 16 wave table generators (ex: harm, spline, random) • Sfront optimizes the code produced for each library element instance based on the invocation attributes • rate, width, size, constancy, integral nature of the parameters, number of paramaters

  24. Conclusions • MP4-SA puts emphasis on sound synthesis methods that can be described in a small amount of space. • Physical Modeling good • Sampling Natural Instruments bad • If models are chosen carefully, compression ratios of 100 to 10,000 are possible. • MP4-SA specifies that a decoder produces audio that “sounds identical” to computing the program accurately.

More Related