1 / 33

Video Concepts and Techniques

Video Concepts and Techniques. Wen-Shyang Hwang KUAS EE. Outline. Fundamental Concepts Basic Video Compression Techniques MPEG Video Coding I – MPEG-1 and 2 MPEG Video Coding II – MPEG-4, 7, and Beyond. Types of Video Signals. 3 types: Component Video, Composite Video, S-Video

afya
Télécharger la présentation

Video Concepts and Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video Concepts and Techniques Wen-Shyang Hwang KUAS EE.

  2. Outline • Fundamental Concepts • Basic Video Compression Techniques • MPEG Video Coding I – MPEG-1 and 2 • MPEG Video Coding II – MPEG-4, 7, and Beyond

  3. Types of Video Signals • 3 types: Component Video, Composite Video, S-Video • Component Video – 3 signal • use 3 separate video signals for red, green, and blue image planes. • most computer systems use it. • get best color reproduction since no crosstalk between channels. • however, requires more bandwidth and good synchronization. • Composite Video - 1 signal • chrominance and luminance signals are mixed into a single carrier. • chrominanceis composition of (I and Q, or U and V) • a color subcarrier put chrominance at high-frequency end of the signal shared with luminance signal. • some interference between luminance and chrominance signals. • S-Video - 2 Signals • uses two wires for luminance and composite chrominance signals. • less crosstalk between them.

  4. Analog Video • Interlaced scanning • odd-numbered lines traced first, then even-numbered lines traced • horizontal retrace: the jump from Q to R, during which the electronic beam in CRT is blanked. • vertical retrace: the jump from T to U or V to P. • NTSC (National Television System Committee) TV standard • used in North America and Japan. • 4:3 aspect ratio (ratio of picture width to height) • 525 scan lines per frame at 30 frames per second (fps).

  5. Digital Video • Advantages: • stored in memory, ready to be processed (noise removal, cut and paste), and integrated to various multimedia applications • repeated recording does not degrade image quality • ease of encryption and better tolerance to channel noise • Chroma Subsampling • human see color with much less spatial resolution than black/white • how many pixel values should be actually sent? • scheme (4:4:4): no chroma subsampling is used: each pixel's Y, Cb and Cr values are sent. • scheme (4:2:2): horizontal subsampling of Cb, Cr signals by a factor of 2. all Ys are sent, and every two Cb's and Cr's are sent. • scheme (4:1:1): subsamples horizontally by a factor of 4 • scheme (4:2:0): subsamples in both the horizontal and vertical dimensions by a factor of 2. (used in JPEG and MPEG)

  6. Video Compression • A video consists of a time-ordered sequence of frames, i.e.,images. • Video Compression • (Static) predictive coding based on previous frames. • temporal redundancy: consecutive frames in a video are similar • subtract images in time order, and code the residual error. • The approach of deriving the difference image (subtract image from the other) is ineffective because of object motion. • Steps of Video compression based on Motion Compensation (MC) • Motion Estimation (motion vectorsearch). • MC-based Prediction. • Derivation of the prediction error, i.e., the difference.

  7. Motion Compensation • For efficiency, each image is divided into macroblocks of size N X N. • The current image frame is referred to as Target Frame. • A match is sought between the macroblock in the Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)). • motion vectorMV: the displacement of the reference macroblock to the target macroblock. • Prediction error: the difference of two corresponding macroblocks.

  8. Video Coding Evolution

  9. H.261 • An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards. • Designed for videophone, video conferencing and other audiovisual services over ISDN. • The video codec supports bit-rates of p 64 kbps, where p ranges from 1 to 30. • The delay of the video encoder must be less than 150 msec so that the video can be used for real-time bidirectional video conferencing. • H.261 Frame Sequence:

  10. H.261 Frame Sequence • Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): • I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame, hence “Intra”. • P-frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed –not just from a previous I-frame). • Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal. • Interval between pairs of I-frames is a variable. Usually, an ordinary digital video has a couple I-frames per second.

  11. Intra-frame (I-frame) Coding • Macroblocks are of size 16X16 pixels for the Y frame, and 8X8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed. A macroblock consists of four Y, one Cb, and one Cr 8X8 blocks. • For each 8X8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.

  12. Inter-frame (P-frame) Predictive Coding • H.261 P-frame coding scheme based on motion compensation: • For each macroblock in Target frame, a motion vector is allocated by search method. After the prediction, a difference macroblock is derived to measure the prediction error. • Each of these 8X8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures. • Sometimes, a good matchcannot be found, then encode the macroblockas an intra macroblock. • The quantization in H.261 uses a constantstep size, for all DCT coefficients within a macroblock.

  13. H.261 encoder and decoder

  14. A Glance at Syntax of H.261 Video Bitstream • A hierarchy of four layers: Picture, Group of Blocks (GOB), Macroblock, and Block.

  15. Syntax of H.261 • Picture layer: PSC (Picture Start Code) delineates boundaries between pictures. TR (Temporal Reference) provides a time-stamp for the picture. • GOB layer: H.261 pictures are divided into regions of 11X3 macroblocks, each of which is called a Group of Blocks (GOB). • In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identifiable GOB. • Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8X8 image blocks (4 Y, 1Cb, 1 Cr). • Block layer: For each 8X8 block, the bitstream starts with DC value, followed by pairs of length of zerorun (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code.

  16. H.263 • An improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN). • aims at low bit-rate communications at bit-rates of less than 64 kbps. • uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction). • The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture.

  17. Optional H.263 Coding Modes • H.263 specifies many negotiable coding options. • Unrestricted motion vector mode • Syntax-based arithmetic coding mode • Advanced prediction mode • PB-frames mode • Introduction of a B-frame (predicted bidirectionally) • Improve the quality of prediction. • The PB-frames mode yields satisfactory results for videos with moderate motions. • Under large motions, PB-framesdo not compress as well as B-frames.

  18. MPEG • MPEG (Moving Pictures Experts Group), established in 1988 for the development of digital video. • MPEG-1 adopts CCIR601 digital TV format: SIF (Source Input Format). supports only non-interlaced video. • Normally, MPEG-1picture resolution is: • 352X240 for NTSC video at 30 fps, or • 352X288 for PAL video at 25 fps • It uses 4:2:0 chroma subsampling. • MPEG-1 standard has 5 parts: • ISO/IEC 11172-1 system • 11172-2 Video • 11172-3 Audio • 11172-4 Conformance • 11172-5 Software

  19. Motion Compensation in MPEG-1 • Motion Compensation (MC) based video encoding in H.261 works as : • In Motion Estimation (ME), each macro-block (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction. • prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps. • The prediction is from a previous frame - forward prediction. • The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.

  20. Motion Compensation in MPEG-1 (Cont'd) • MPEG introduces a third frame type: B-frames, and its accompanying bi-directional motion compensation. • Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction). • If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged (indicated by `%' in the figure) before comparing to the Target MB for generating the prediction error. • If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.

  21. MPEG-1 B-frame Coding Based on Bidirectional Motion Compensation. MPEG Frame Sequence

  22. Other Major Differences from H.261 • Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices. • May contain variable numbers of macro-blocks in a single picture. • May start and end anywhere as long as they fill the whole picture. • Each slice is coded independently (flexibility in bit-rate control). • Slice concept is important for error recovery.

  23. Typical Sizes of MPEG-1 Frames • Size of compressed P-frames is significantly smaller than of I-frames. • B-frames are smaller than P-frames. (B-frames: lowest priority).

  24. MPEG-2 • MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. • Defined 7profiles aimed at different applications: • Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview. • Within each profile, up to 4 levelsare defined. • The DVD video specification allows only 4 display resolutions: 720X480, 704X480, 352X480, and 352X240 (a restricted form of the MPEG-2 Main profile at the Main and Low levels). • Four Levels in the Main Profile of MPEG-2

  25. Supporting Interlaced Video • MPEG-2 supports interlaced video for digital broadcast TV and HDTV. • In interlaced video, each frame (picture) consists of two fields. If each field is treated as a separate picture, then is called Field-picture. • 5 Modes of Predictions: (wide range of applications requirement for accuracy and speed of motion compensation vary) • Frame Prediction for Frame-pictures • Field Prediction for Field-pictures • Field Prediction for Frame-pictures • 16X8 MC for Field-pictures • Dual-Prime for P-pictures

  26. MPEG-2 Scalabilities • layered coding: a base layer and one or more enhancement layers. • MPEG-2 supports the following scalabilities: • SNR Scalability- enhancement layer provides higher SNR. • Spatial Scalability- enhancement layer provides higher spatial resolution. • Temporal Scalability- enhancement layer facilitates higher frame rate. • Hybrid Scalability- combination of any two of the above three scalabilities. • Data Partitioning- quantized DCT coefficients are split into partitions.

  27. MPEG-4 • MPEG-4 adopts a new object-based coding approach. (not frame-based compression coding) • object-based coding has higher compression ratio and good for digital video composition, manipulation, indexing, and retrieval. • Its 6 parts are system, video, audio, conformance, software, and DMIF (Delivery Multimedia Integration Framework). • bit-rate range for MPEG-4 video now between 5 kbps to 10 Mbps.

  28. Comparison of interactivities in MPEG standards: • MPEG-4 standard for: • Composingmedia objects to create desirable audiovisual scenes. • Multiplexing and synchronizing the bitstreams so that they can be transmitted with guaranteed Quality of Service (QoS). • Interacting with audiovisual scene at receiving end (provides a toolbox of advanced coding modules and algorithms for audio and video compressions). Reference models in MPEG-1 and 2 (interaction in dashed lines supported only by MPEG-2) MPEG-4 reference model

  29. Hierarchical structure of MPEG-4 visual bitstreams • Video-object Sequence (VS) - delivers the complete MPEG-4 visual scene, which may contain 2-D or 3-D natural or synthetic objects. • Video Object (VO) - a particular object in the scene, which can be of arbitrary (non-rectangular) shape corresponding to an object or background of the scene. • Video Object Layer (VOL) - facilitates a way to support (multi-layered) scalable coding. A VO can have multiple VOLs under scalable coding, or have a single VOL under non-scalable coding. • Group of Video Object Planes (GOV) - groups Video Object Planes together (optional level). • Video Object Plane (VOP) - a snapshot of a VO at a particular moment. Each VS will have one or more VOs,each VO will have one or more VOLs,and so on.

  30. VOP-based Coding • MPEG-1 and -2 do not support the VOP concept, and hence their coding method is referred to as frame-based (block-based) coding. • MPEG-4 VOP-based coding employs Motion Compensation technique: • Intra-frame coded VOP is called I-VOP. • Inter-frame coded VOPs are called P-VOPs (forward prediction) or B-VOPs (bi-directional Predictions). (c) Two potential matches in MPEG-1 and 2(d) object-based coding in MPEG-4 (a) A video sequence; (b) MPEG-1 and 2 block-based coding.

  31. ISO MPEG-4 Part10/ ITU-T H.264 • Offers up to 50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile. • The leading candidates to carry High Definition TV (HDTV) video content on many potential applications. • Core features: • Entropy decoding, Motion compensation (P-prediction), Intra-prediction (I-prediction), Transform, scan, quantization, and In-loop deblocking filters. • Baseline profile features • Arbitrary slice order (ASO), Flexible macroblock order (FMO), redundant slices • Main profile features • B slices, Context adaptive binary arithmetic coding (CABAC), weighted prediction • Extended profile features • B slices, weighted prediction, Slice data partitioning, SP and SI slice types.

  32. MPEG-7 • To serve the need of audiovisual content-basedretrieval (or audiovisual object retrieval) in applications such as digital libraries. • The formal name Multimedia Content Description Interface.

  33. MPEG-7 and Multimedia Content Description • MPEG-7 has developed Descriptors (D), Description Schemes (DS)and Description Definition Language (DDL). The following are some of the important terms: • Feature - characteristic of the data. • Description - a set of instantiated Ds and DSs that describes the structural and conceptual information of the content, the storage and usage of the content, etc. • D - definition (syntax and semantics) of the feature. • DS - specification of the structure and relationship between Ds and between DSs. • DDL - syntactic rules to express and combine DSs and Ds. • The scope of MPEG-7 is to standardize the Ds, DSs and DDL for descriptions. The mechanism and process of producing and consuming the descriptions are beyond the scope of MPEG-7.

More Related