EE 5359 Multimedia Processing Project Multiplexing of AVS part 2 video with AAC audio bit stream

EE 5359 Multimedia Processing ProjectMultiplexing of AVS part 2 video with AAC audio bit stream Submitted by, Swaminathan Sridhar MS EE, UTA Swaminathan.sridhar@mavs.uta.edu

Project Proposal The objective of this project is to multiplex AVS (Part 2 Jizhun profile a.k.a Base profile) video with AAC (Advanced Audio Coding) audio bit stream and de-multiplex the same.

An overview on AVS standard [1] • AVS (Audio Video Systems) China is the new digital video codec standard developed by China on a need to reduce the royalty fees paid by the Chinese people for using other international video coding standards such as MPEG-2, MPEG-4 and MPEG-4 part 10 a.k.a. H.264

Different parts of AVS China [1]

Profile levels on AVS [3], [4]

Applications of Different Profiles [3], [4] • Jizhun Profile : Defined in AVS part 2 and is targeted mainly at digital video applications like commercial broadcasting and storage media. Moderate computational complexity. • Jiben Profile : AVS Part 7 for Mobile applications • Shenzan and Jiaqiang profile : AVS Part 2 for video surveillance and Multimedia entertainment respectively.

AVS Part 1 systems [5] • AVS part 1 a.k.a Systems comprises a set of standard that converts single/multi channel audio and video bit stream into a single multiplexed stream for transmission and storage and also defines a encoding syntax which is necessary for synchronous de-multiplexing of audio and video bit streams. • AVS System basically comprises of two data streams namely the Program stream and Transport stream where each one has its own applications. AVS part1 complies with AVS part 2 or AVS part 7 video, ASV part 3 audio, AAC audio as its elementary bit stream.

Basic process of Multiplexing in AVS Part 1 [5]

Introduction to AAC bit stream [6], [7] • Sample frequencies from 8 KHz to 96 KHz (MP3 16 KHz to 48 KHz) and henceforth can support 48 channels. • Higher efficiency and simpler filter bank (MDCT- Modified Discrete Cosine Transform) • Much better handling of frequencies above 16 KHz. • Superior performance at bit rates > 64 Kbps and bit rates reaching as low as 16 Kbps. • AAC meets the requirement for stereo quality sound at 128 Kbps and 5.1 audio at 320 Kbps.

Basic Profiles in AAC codec [6], [7] • Main Profile - Uses all tools except the gain control module. Provides the highest quality for applications where the amount of random accessory memory (RAM) needed is not constrained. • Low-complexity Profile - Deletes the prediction tool and reduces the temporal noise-shaping tool in complexity. This is the most widely used profile • Sample-rate Scalable (SRS) Profile - Adds the gain control tool to the low complexity profile. Allows the least complex decoder.

NAL Unit syntax [12]

NAL unit type [13]

ADTS format [13] • ADIF (audio data interchange format) format actually is just one header at the beginning of the AAC file. The rest of the data are consecutive raw data blocks. This file format is meant for simple local storing purposes, where breaking of the audio data is not necessary. • ADTS (audio data transport stream) has one header for each frame followed by raw block of data. ADTS headers are present before each AAC raw data block or block of 2 to 4 raw data blocks in a frame to ensure better error robustness in streaming environments. Hence in this thesis, ADTS bit stream format is adopted.

ADTS header [13]

ADTS header continued [13]

ADTS Profile bits [13]

PES encapsulation [14] • The packetized elementary stream (PES) packets are obtained by encapsulating coded video, coded audio, and data elementary streams. This forms the first layer of packetization. The encapsulation on video and audio data is done by sequentially separating the elementary streams into access units. Access units in case of audio and video elementary streams are audio and video frames respectively. Each PES packet contains data from one and only one elementary stream. PES packets may have a variable length since the frame size in both audio and video bit streams is variable. The PES packet consists of the PES packet header followed by the PES packet payload. The header information distinguishes different elementary streams, carries the synchronization information in the form of timestamps and other useful information

PES encapsulation format [14]

PES header [14]

PES Payload • The PES payload is either an audio frame or a video frame data. If it is an audio PES packet, then the AAC bit stream is searched for a 12 bit sync word of the audio data transport stream (ADTS) format. Then the frame length is obtained from the ADTS header, and that block of data is encapsulated with the audio stream ID to form the audio PES packet. Audio frame number is calculated form the beginning of the stream and the frame number is coded as the 2 byte timestamp. • If the payload has a video frame, the encoded AVS bit stream is searched to find the NAL unit’s prefix byte sequence of 0x00000001 which marks the beginning of a video data set. Then the five LSB bits of the following byte are analyzed to find if the NAL unit contains a frame, a picture parameter set or a sequence parameter set.

PES Payload • The picture parameter set and sequence parameter set carry some important information that is required by the AVS decoder. So in order to facilitate decoding from any IDR frame, these two NAL units need to be transmitted at regular intervals. So, the picture parameter set and the sequence parameter set, if found in the bit stream, are combined to form a separate PES with frame number zero. Instead if the NAL unit contains IDR, P or B slice data, then the frame number is calculated from the beginning of the stream and is encapsulated as a timestamp along with the video stream ID and frame length. • The PES has an 8 byte header and variable size payload. In an error-prone transmission channel fixed size packets are more desirable, since it is easier to detect and correct errors. So, this requires the data to be put through one more layer of packetization in order to obtain fixed size packets that can be used for transmission.

Transport stream [14] • The second layer of packetization forms a series of packets called the transport stream (TS). These are fixed length subdivisions of the PES packets with additional header information. These packets are multiplexed together to form a transport stream carrying more than one elementary stream. A TS packet is 188 bytes in length and always begins with a synchronization byte of 0x47. This structure of the packet was originally chosen for compatibility with ATM systems [21]. However there are some applications where more bytes are added at the end to accommodate error correction data like Reed-Solomon or CRC error check data. • There are a few constraints to be met while forming the transport packets: • Total packet size should be of fixed size (188 bytes). • Each packet can have data from only one PES. • PES header should be the first byte of the transport packet payload. • PES packet is split or stuffing bytes are added if the above constraints are not met.

TS format

Conclusion • The elementary video stream is obtained from AVS part2 video and the audio stream is obtained from AAC bit stream which are packetized using the PES and are multiplexed to form a single stream. These streams are then de-multiplexed. The basic code was implemented using Matlab.

Simulation Results for AVS part 2 video [8]

Simulation results for AAC encoder [9]

Matlab code for Multiplexing • function[stream]= Multiplexer() • fid=fopen('C:\results\container_enc.avs'); • video1=fread(fid); • fid=fopen('C:\Users\sanju\Desktop\sanju.aac'); • audio1=fread(fid); • lvideo=length(video1); • laudio=length(audio1); • video=dec2bin(video1,8); • audio=dec2bin(audio1,8); • v_header='111111111'; • a_header='111111101'; • a=1; • while(v<=ceil(lvideo/1000)||a<=ceil(laudio/1000)) • if v==1 • stream=v_header; • else • stream=strcat(stream,v_header); • end

Matlab code for Multiplexing • for i=1:4 • if (v<=ceil(lvideo/1000)) • stream=strcat(stream,video(v,:),'0'); • v=v+1; • else • stream=strcat(stream,dec2bin(0,8),'0'); • break; • end • end • stream=strcat(stream,a_header); • for i=1:3 • if (a<=ceil(laudio/1000)) • stream=strcat(stream,audio(a,:),'0'); • a=a+1; • else • stream=strcat(stream,dec2bin(0,8),'0'); • break; • end • end • end • fid=fopen('C:\Users\Sanju\Desktop\Multiplex.txt','wb'); • fwrite(fid,stream); • fclose(fid);

Multiplexed output

Matlab code for De-Multiplexing • function [video audio]= Demultiplexer() • fid=fopen('C:\Users\Sanju\Desktop\Multiplex.txt'); • stream=fread(fid); • stream=char(stream); • stream=stream' • lstream=length(stream); • v_header='111111111'; • a_header= '111111101'; • v_head=v_header'; • a_head=a_header'; • n=1; • video =zeros(1,200); • audio =zeros(1,200); • while(n<=lstream) • if stream(n:n+8)== v_header • n=n+9 • for k=1:200 • if(n<=lstream) • if stream (n:n+8)==a_header • break • else

Matlab code for De-multiplexing • video_byte=stream(n:n+7); • video=strcat(video,video_byte); • n=n+9; • end • else • break • end • end • elseif stream(n:n+8)== a_header • n=n+9 • for m=1:200 • if(n<=lstream) • if (stream(n:n+8)==v_header) • break • else • audio_byte=stream(n:n+7); • audio=strcat(audio,audio_byte); • n=n+9; • end • else • break • end • end • end • end

De-Multiplexed output

References • 1]W. Gao et al. “AVS - The Chinese Next-Generation Video Coding Standard” NAB, Las Vegas, 2004. • 2] L. Yu et al. “An Overview of AVS-Video: tools,performance and complexity”, Visual Communications and Image Processing 2005, Proc. of SPIE, vol. 5960, pp.596021, July 31, 2006. • 3] W. Gao“ AVS standard - Audio Video Coding Standard Workgroup of China”, ; International Conference onWireless and Optical Communications, 14th Annual WOCC 2005, pp. 54, 22-23 April 2005. • 4] L. Yu et al. “Overview of AVS-video coding standards”, Special issues on AVS standards, Vol. 24, Issue 4, pp. 247-262, April 2009. • 5] GB/T 20090.1 “Information technology - Advanced coding of audio and video – Part 1: System, Chinese AVS standard”. • 6] X. Hu et al. “An efficient Low Complexity encoder for MPEG Advanced Coding” ICACT 2006, pp. 1501-1505, Feb. 20-22, 2006. • 7] M. Watson et al. “Design and Implementation of AAC Decoders” IEEE Transactions on Consumer Electronics, Vol. 46, No.3, pp. 819-834, Aug. 2000. • 12] MPEG-4: ISO/IEC JTC1/SC29 14496-10: “Information technology – Coding of audio-visual objects - Part 10: Advanced Video Coding”, ISO/IEC, 2005. • 13] B. Lechner et al. “The ATSC Transport Layer, Including program and system information protocol (PSIP)”, Proc of the IEEE, vol. 94, no. 1,pp 77-101, January 2006 • 14] H. Kalva et al. “Implementing Multiplexing, Streaming and Server Interaction for MPEG-4”, IEEE transactions on circuits and systems for video technology, vol. 9, No.8, pp 1299-1311,december 1999. • 15] The standards of people’s republic of China, GB/T 20090.2-2006 “Information technology: Advanced coding of audio and video, part 2 video”

References • Reference software used • 8] ftp://159.226.42.57/public/avs_doc/avs_software - For AVS part 2 video • 9] PsyTEL software - For AAC audio • Reference Web Sites: • 10] Audio coding website www.audiocoding.com • 11] AAC software : http://www.psytel-research.co.yu

EE 5359 Multimedia Processing Project Multiplexing of AVS part 2 video with AAC audio bit stream