520 likes | 847 Vues
MULTIPLEXING OF AVS CHINA PART 2 VIDEO WITH AAC BIT STREAMS AND DE-MULTIPLEXING WITH LIP SYNC DURING PLAYBACK . Swaminathan Sridhar Multimedia Processing Lab University of Texas at Arlington. Thesis outline:. What is multiplexing? Applications of multiplexing.
E N D
MULTIPLEXING OF AVS CHINA PART 2 VIDEO WITH AAC BIT STREAMS AND DE-MULTIPLEXING WITH LIP SYNC DURING PLAYBACK Swaminathan Sridhar Multimedia Processing Lab University of Texas at Arlington
Thesis outline: • What is multiplexing? • Applications of multiplexing. • The need for choosing AVS video and AAC audio codecs. • Video & audio elementary stream formats. • Multiplexing process. • De-multiplexing process. • Lip synchronization during playback. • Results and conclusions. • Future work. • References.
What is multiplexing? • A multimedia program is a combination of multiple elementary streams such as video and audio. • Multiplexing is the process of converting multiple elementary streams such as video an audio streams into a single transport stream for transmission. • It conserves the usage of transmission channels.
Applications of multiplexing Multiplexing is used in areas of applications such as • ATSC • DVB-T • DVB-S • DVB-H • IPTV
The digital transmission/reception process adopted in the ATSC standard [22]
The need for video and audio compression • With the advent of high definition television transmission schemes high quality video and audio data are transmitted which occupy a lot of bandwidth over a transmission channel. • To address this issue the video and audio data are compressed using efficient compression schemes such as AVS China video codec and AAC audio codec.
Why AVS China video ? • AVS (audio video coding standard) China is the latest digital video coding standard developed by the AVS work group of China. • AVS video codec employs the latest video coding tools which primarily targets standard definition (SD) and high definition (HD) video compression. • Compared to the previous video coding standards such as MPEG-2 and MPEG-4, AVS achieves the same quality of video with significantly lower bit rates or vice versa.
Coding tools of AVS part 2 video codec • Intra prediction : 8x8 block based intra prediction. 5 modes for the luminance component namely the DC, horizontal, vertical, down left and down right and 4 modes for the chrominance component namely the DC, horizontal, vertical and plane mode are specified. • Motion compensation : 16x16, 16x8, 8x16 and 8x8 block sizes. • Motion vector resolution: ¼ pixel accuracy with 4-tap interpolation filter. • Transform: 8x8 integer cosine transform. • Quantization and scaling with scaling only in the encoder. • Entropy coding: context based 2D-VLC • De-blocking filter: performed around the 8x8 boundaries
AVS video encoded bit stream format • Start code: It consists of start code prefix and start code value. • Start code prefix: A string of 23 zero bits followed by a single bit with a value of ‘1’ i.e. ‘0x000001’ which are all byte aligned. This is followed by start code value. • Start code value: It is an 8 bit integer that identifies the start code type.
Start code types & start code values used in the AVS-video bit stream [8]
Picture coding type used in AVS-video bit stream • Pb_picture_start_code : The bit string format is ‘0x000001B6’ which indicates the start code of P or B picture. • Picture_coding_type: It is a 2-bit unsigned integer which specifies the coding type of a picture as shown in Table 1. Table 1 Coding type of a picture [8]
NAL unit • NAL unit stands for network abstraction layer unit which is a type of packetization that prefixes certain headers to the encoded video bit stream. • It was designed to provide a network friendly environment for transmission of video data . • It mainly addresses video related applications such as video telephony, video storage, broadcast and streaming applications, IPTV etc. • The syntax for NAL unit is defined in H.264 standard but AVS part 2 standard does not define any syntax for the NAL unit.
NAL unit mapping with the encoded AVS video stream • The basic syntax for the NAL unit is shown in figure 1. • Figure 1 NAL unit syntax [13]. • NAL unit consists of a 8 bit header followed by the payload. • The procedure for mapping AVS video stream with NAL unit is to map • the data between every start code prefixes i.e. ‘0x000001’ in the AVS • video stream into a NAL unit (which includes the start code value but • not the code prefixes) and then add a 1-byte header before the start • code value.
NAL unit header description • It is a 8 bit header consisting of the following parameters. • Forbidden_zero_bit : which is a 1 bit value and it is always ‘0’. • Nal_ref_idc : which is a 2-bit unsigned integer value. It indicates the priority of the type of data carried in the NAL unit based upon the start code type. This value should not be zero for I frames. • Nal_unit_type : which is a 5-bit unsigned integer value and therefore 32 types of NAL units are allowed. This value indicates the type of data carried in the NAL payload.
Why AAC audio? • AAC codec showed superior performance at both low and high bit rates as compared to MP3 and AC3. • Supports up to 48 audio channels with a wide variety of sampling frequencies from 8 KHz to 96 KHz. • The first codec to achieve ITU-R broadcast quality at a bit rate of 128 Kb/s for stereo. • The encoding efficiency is nearly 30 % more than MP3 (MPEG-1/2 audio layer 3).
AAC audio • Advanced audio coding is a standardized lossy compression scheme for coding the digital audio. • It has been standardized under the ISO/IEC as part 7 of the MPEG-2 standard and part 3 of the MPEG-4 standard. AAC profiles: • Main profile: Provides the highest audio quality and is the most complex. • Low-complexity profile: Achieves nearly the same audio quality as the main profile but with significant savings on the memory and process requirements. • Scalable sampling rate profile: It provides flexibility for scalable and low-complexity applications. It is more appropriate in applications where bandwidth is a constraint.
AAC audio stream format • ADIF- Audio Data Interchange Format This format uses only one header in the beginning of the file followed by the raw audio data blocks. It is generally used for storage applications. • ADTS- Audio Data Transport Stream This format uses separate header for each frame enabling decoding from any frame. This format is mainly used for transport applications.
Factors to be considered for multiplexing and transmission • The audio and video coded bit streams are split into smaller data packets. • The frame wise arrangement of the coded video and audio streams help in forming small data packets. • While multiplexing, equal priority is given to all the elementary streams. • Additional information to help synchronize the audio and video at the de-multiplexer in the form of time stamps are embedded in the packet header.
AVS China part 2 Encoder Video Source Packetizer MPEG 2 Transport Stream Audio & Video elementary stream Multiplexer AAC Encoder Audio Source Packetizer Data Source Packetizer • Packetization • 2 layers of packetization are adopted for multiplexing that conform to MPEG 2 systems standard: • PES –Packetized Elementary Stream layer • TS- Transport Stream layer
Packetized elementary streams (PES) • Elementary streams (ES) are composed of: Encoded video (AVS) stream Encoded audio (AAC) stream Optional Data stream • PES contains the access units (frames) that are sequentially separated and packetized. • PES headers differentiates various ES and contains time stamp information useful for synchronizing video and audio stream at the de-multiplexer. • PES packet sizes varies with the size of each access unit. • Each PES can have data from only one ES.
PES header description • 3 bytes of start code – ‘0x000001’. • 1 byte of stream ID (unique for each ES). • 2 bytes of packet length. • 2 bytes of time stamp (frame number)
Frame number as time stamp • For video PES: Since the video frame rate is constant (i.e. either 25 or 30 frames per second ), the playback time of a particular frame can be calculated from the frame number as Playback time = frame number/fps • For audio PES: Since the input sampling frequency is constant (i.e. between 8-96 kHz) and the No. samples per AAC frame is 1024, the playback time of a particular audio frame can be calculated from the frame number as Playback time = 1024*frame number *(1/sampling frequency)
Method adopted in MPEG 2 systems standard for time stamps • Audio-video synchronization is achieved using the presentation time stamp (PTS) • The encoder attaches a PTS to video and audio frame which is a 33 bit value in cycles of a 90-KHz system time clock (STC). • Additional information known as program clock reference (PCR) which is the value of the STC at the encoder is periodically transmitted to achieve exact synchronization.
Advantages of using frame number as time stamp over the existing method that uses clock samples as time stamp • Less complex and is suitable for software implementation. • No synchronization problem due to clock jitters. • No propagation of delay between audio and video streams. • Saves the extra over head in the PES header bytes used for sending the PCR bytes.
Transport stream packetization • PES packets formed from the various elementary streams are broken into smaller packets known as the transport stream (TS) packets. • Transport stream packets have a fixed length of 188 bytes. • One of the reasons of the choosing the TS packet size is the interoperability with the ATM packets such that each MPEG 2 TS packets is broken down to 4 ATM packets. Constraints: • Each TS packet can have data from only one PES. • PES header should be the first byte of the TS payload. • If the above constraints are not met, stuffing bytes are added.
TS packet header description • Sync byte: A TS packet always starts with a sync byte of 0x47. • Payload unit start indicator: This bit is set to indicate that the first byte of the PES packet is present in the payload of the current TS packet. • Adaptation field control (AFC): This bit is set if the data carried in the TS packet payload is other than the PES data. This can be a stretch of stuffing bytes in case the length of PES data is less than 185 bytes. • Packet identifier (PID): This is a 10 bit packet identifier value. This is used to uniquely identify the video and audio ES. Some values of the PID are pre-defined such as a PID value of ‘0x1FFF’ indicates a null TS packet which is sent in regular intervals to create an overall constant bit stream. • Continuity counter: This is a 4 bit counter which is incremented by one every time the data from the same PES is encapsulated into a TS packet. • Payload byte offset: If AFC is set to ‘1’, byte offset value of the start of the payload is mentioned here.
Adopted multiplexing method • Multiplexing method plays an important role in avoiding the buffer overflow or underflow at the de-multiplexing end. • Video and audio timing counters are used to ensure effective multiplexing of the TS packets. • Timing counters are incremented according to the playback time of each TS packet. • A packet with the least timing counter value is always given preference during packet allocation.
Buffer fullness at the de-multiplexer using the adopted method
Synchronization and playback • The data is loaded from the buffer during playback. • IDR frame searched from the starting of the video buffer. • Frame number of the IDR frame is extracted. • The playback time of the current IDR frame is calculated as, Video playback time=IDR frame number/fps • The corresponding audio frame number is calculated as, Audio frame number= (Video playback time * sampling frequency)/1024
Synchronization and playback • If a non-integer value, the audio frame number is rounded off and the corresponding audio frame is searched in the audio buffer. • The audio and video contents from the corresponding frame numbers are decoded and played back. • Then the audio and video buffers are refreshed and new set of data are loaded into the buffers and this process continues. • If the corresponding audio frame is not found in the buffer, then next IDR frame is searched and the same process is repeated.
Conclusions • Synchronization of audio-video is achieved by starting the de-multiplexer from any TS packet. • Visually there is no lag between the video and audio. • The buffer fullness at the de-multiplexer end is continuously monitored and buffer overflow or underflow is prevented using the adopted multiplexing method.
Test conditions • Input raw video: YUV format. • Input raw audio: WAVE format. • Profiles used: AVS: Jizhun (baseline) profile. AAC: Low complexity profile with ADTS format. • GOP: IBBPBB (IDR forced). • Video frame rate: 25 frames per second. • Audio sampling frequency: 44.1 KHz. • Single program TS is generated.
Future Work • The algorithm can be extended to support multiple elementary streams such as to include subtitles during playback. • The proposed algorithm can also be modified to support elementary streams from different video and audio codecs depending on their NAL and ADTS formats respectively. • The adopted method can also be extended to support some error resilient codes in the case of transmission of multimedia program over error prone networks.
References • [1] L. Yu et al, “Overview of AVS-Video: tools, performance and complexity”, SPIE VCIP, vol. 5960, pp. 596021-1~596021-12, Beijing, China, July 2005. • [2] W. Gao et al, “AVS- The Chinese next-generation video coding standard”, National Association of Broadcasters, Las Vegas, 2004. • [3] X. Wang et al, “Performance comparison of AVS and H.264/AVC video coding standards”, J. Computer Science & Technology, vol. 21, No. 3, pp. 310-314, May 2006. • [4] L. Yu et al. “Overview of AVS-video coding standards”, Special issue on AVS standards, Signal Processing: Image Communication, vol. 24, pp. 247-262, April 2009. • [5] AVS Work Group website • http://www.avs.org.cn/en/ • [6] R. A. Burger et al, “A survey of digital TV standards China”, IEEE Second International Conference on Communications and Networking in China, pp. 687-696, Aug. 2007. • [7] L. Fan et al, “Overview of AVS video standard”, in the proceedings of IEEE Int’l Conf. on Multimedia and Expo, ICME ’04, vol. 1, pp. 423-426, Taipei, Taiwan, June 2004.
References • [8] Information Technology – Advanced coding of audio and video – Part 2: Video, The standards of People’s Republic of China, GB/T 20090.2 – 2006. • [9] I. E. G. Richardson, “The H.264 advanced video compression standard”, II Edition Wiley, 2010. • [10] C. X. Zhang et al, “The technique of pre-scaled transform”, IEEE Int’l Symposium on Circuits and Systems, vol.1, pp. 316-319, May 2005. • [11] Q. Wang et al, “Context-based 2D-VLC entropy coder in AVS video coding standard”, J. Computer Science & Technology, vol. 21, No.3, pp. 315-322, May 2006 • [12] H. Jia et al, “An AVS HDTV video decoder architecture employing efficient HW/SW partitioning”, IEEE Trans. on Consumer Electronics, vol. 52, pp. 1447- 1453, Nov. 2006. • [13] T. Wiegand et al, “Overview of the H.264/AVC video coding standard”, IEEE Trans. on CSVT, vol. 13, pp. 560-576, July 2003. • [14] GB 20090.2 RTP Payload Format, FG IPTV- 0512, International Telecommunications Union, May 2007. • [15] Information Technology – Generic coding of moving pictures and associated audio: Systems, International Standard 13818-1, ISO/IEC JTC1/SC29/WG11 N0801, 1994.
References • [16] X. Hu et al., “An efficient low complexity encoder for MPEG advanced audio coding”, in the proceedings of IEEE ICACT conference, pp. 1501-1505, Feb. 2006. • [17] Information Technology – Generic coding of moving pictures and associated audio information- Part 7: Advanced Audio Coding (AAC), ISO/IEC 13818-7, 2006. • [18] Information Technology – Coding of audio-visual objects- Part 3: Audio, ISO/IEC 14496-3, 2009. • [19] M. A Watson et al, “Design and implementation of AAC decoders”, IEEE Trans. on consumer electronics, vol. 46, pp. 819-824, Aug. 2000. • [20] K. Takagi et al., “Conversion of MP3 to AAC in the compressed domain”, IEEE 8th Workshop on Multimedia Signal Processing, pp. 132-135, Oct. 2006. • [21] K. Brandenburg, “Low bit rate audio coding-state-of-the-art, challenges and future directions”, in the proceedings of IEEE ICSP, vol. 1, pp. 1-4, Aug 2000. • [22] B. J. Lechner et al, “The ATSC transport layer, including program and system information protocol (PSIP)”, Proceedings of the IEEE, vol. 94, no. 1, pp. 77-101, Jan. 2006. • [23] J. Stott, “Design technique for multiplexing asynchronous digital video and audio signals”, IEEE Trans. on communications, vol. 26, no. 5, pp. 601-610, May 1978.