Source Coding

Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn hafiz@umich.edu http://www-perosnal-engin.umd.umich.edu/~hafiz

Overview Information Source Source Encode Encrypt Channel Encode Multiplex Pulse Modulate Multiple Access Frequency Spreading BandpassModulate XMT Source Coding – eliminate redundancy in the data, send same information in fewer bits Channel Coding – Detect/Correct errors in signaling and improve BER

What is information? • Student slips on ice in winter in Southfield, MI • Professor slips on ice, breaks arm, class cancelled • Student slips on ice in July in Southfield, MI • Which statement contains the most information? • Statement with unpredictability factor conveys the most new information and surprise Information (I) is related to -log(Pr{event}) Here, Pr{event} 1  I  0, & Pr{event} 0  I ∞]

Measure of Randomness • Let we have two events, E1 & E2 with Pr{E1} = p & Pr{E2} = q • {E1 U E2} = S • Here p=1-q • Entropy: average information per source output 1.0 0.5 0 Entropy H (bits) 0 0.5 1.0 Probability, p

Comments on Entropy • When log is taken to base 2, unit of H bits/event & for base e nats • Entropy is maximum for a uniformly distributed random variable • Example:

Calculate the average information in bits/character in English assuming each letter is equally likely Example: Average Information Content in English Language

Real-world English But English characters do not appear with the same frequency in the real-world English literature, probabilities of each character are assumes as , Pr{a}=Pr{e}=Pr{o}=Pr{t}=0.10 • Pr{h}=Pr{i}=Pr{n}=Pr{r}=Pr{s}=0.07 • Pr{c}=Pr{d}=Pr{f}=Pr{l}=Pr{p}=Pr{u}=Pr{y} =0.02 • Pr{b}=Pr{g}=Pr{j}=Pr{k}=Pr{g}=Pr{v}=Pr{w}=Pr{x}=Pr{z} =0.01

Example Binary Sequence • X, P(X=1) = P(X=0) = ½ • Pb=0.01 (1 error per 100 bits) • R=1000 binary symbol/sec

Source Coding • Goal is to find an efficient description of information sources • Reduce required bandwidth • Reduce memory to store • Memoryless –If symbols from source are independent, one symbol does not depend on next • Memory – elements of a sequence depend on one another, e.g. UNIVERSIT_?, 10-tuple contains less information since dependent

Source Coding (II) • This means that it’s more efficient to code information with memory as groups of symbols

Desirable Properties • Length • Fixed Length – ASCII • Variable Length – Morse Code, JPEG • Uniquely Decodable – allow user to invert mapping to the original • Prefix-Free – No codeword can be a prefix of any other codeword • Average Code Length (ni is code length of ith symbol)

Uniquely Decodable and Prefix Free Codes • Uniquely decodable? • Not code 1 • If “10111” sent, is code 3 ‘babbb’or ‘bacb’? Not code 3 or 6 • Prefix-Free • Not code 4, • prefix contains ‘1’ • Avg Code Length • Code 2: n=2 • Code 5: n=1.23

Huffman Code • Characteristics of Huffman Codes: • Prefix-free, variable length code that can achieve the shortest average code length for an alphabet • Most frequent symbols have short codes • Procedure • List all symbols and probabilities in descending order • Merge branches with two lowest probabilities, combine their probabilities • Repeat until one branch is left

0.4 0.4 0.2 0.4 0.2 0.2 0.2 0.4 0.2 0.2 0.1 0.1 1 0 a 0.4 1 0 b 0.6 0.2 1 0 c 0.4 0.1 1 0 d 0.1 1 0 e Compression Ratio: 3.0/2.4=1.25 Entropy: 2.32 0.2 0.1 f 0.1 Huffman Code Example 0.6 0.4 1.0 The Code: A 11 B 00 C 101 D 100 E 011 F 010 0.2

Example: • Consider a random vector X = {a, b, c} with associated probabilities as listed in the Table • Calculate the entropy of this symbol set • Find the Huffman Code for this symbol set • Find the compression ratio and efficiency of this code

Extension Codes • Combine alphabet symbols to increase variability • Try to combine very common 2,3 letter combinations, e.g.: th,sh, ed, and, the,ing,ion

Lempel-Ziv (ZIP) Codes • Huffman codes have some shortcomings • Know symbol probability information a priori • Coding tree must be known at coder/decoder • Lempel-Ziv algorithm use text itself to iteratively construct a sequence of variable length code words • Used in gzip, UNIX compress, LZW algorithms

Lempel-Ziv Algorithm • Look through a code dictionary with already coded segments • If matches segment, • send <dictionary address, next character> and store segment + new character in dictionary • If no match, • store in dictionary, send <0,symbol>

Code Dictionary Address Contents Encoded Packets LZ Coding: Example • Encode [a b a a b a b b b b b b b a b b b b b a] 1 a < 0 , a > Note: 9 code words, 3 bit address, 1 bit for new character, 2 b < 0 , b > 3 aa < 1 , a > 4 ba < 2 , a > 5 bb < 2 , b > 6 bbb < 5 , b > 7 bba < 5 , a > 8 bbbb < 6 , b > < 4 , - >

Notes on Lempel-Ziv Codes • Compression gains occur for long sequences • Other variations exist that encode data using packets and run length encoding: <# characters back, length, next character>

Source Coding

Source Coding

Presentation Transcript

Part Two Source Coding, Hybrid Coding

Source Coding: Part 1-Formatting

Network Source Coding

The Source Coding Side of Secrecy

Distributed Source Coding

Source Coding-Compression

Zero-error source-channel coding with source side information at the decoder

Fast Parallel Algorithms for Universal Lossless Source Coding

Joint Source/Channel Coding

Scalable Joint Source-Network Coding of Video

Video Source Coding & Congestion Control

Source Coding

A Brief Review of Joint Source-Channel Coding

Shannon-Kotel’nikov Mappings for Joint Source-Channel Coding

Joint Source Network Coding for

Data Compression: Source Coding: Part II

Source Coding

Distributed Source Coding

CODING A COBOL SOURCE PROGRAM (see prelab.cob)

Distributed Source Coding

Source-Channel Prediction in Error Resilient Video Coding

Source Coding

Source Coding

Presentation Transcript

Part Two Source Coding, Hybrid Coding

Source Coding: Part 1-Formatting

Network Source Coding

The Source Coding Side of Secrecy

Distributed Source Coding

Source Coding-Compression

Zero-error source-channel coding with source side information at the decoder

Fast Parallel Algorithms for Universal Lossless Source Coding

Joint Source/Channel Coding

Scalable Joint Source-Network Coding of Video

Video Source Coding &amp; Congestion Control

Source Coding

A Brief Review of Joint Source-Channel Coding

Shannon-Kotel’nikov Mappings for Joint Source-Channel Coding

Joint Source Network Coding for

Data Compression: Source Coding: Part II

Source Coding

Distributed Source Coding

CODING A COBOL SOURCE PROGRAM (see prelab.cob)

Distributed Source Coding

Source-Channel Prediction in Error Resilient Video Coding

Video Source Coding & Congestion Control