710 likes | 833 Vues
Information Management. DIG 3563 – Lecture 11 Compression – Lossy and Lossless J. Michael Moshell University of Central Florida. Original image* by Moshell et al. Imagery is fromWikimedia except where marked with *. Licensing is listed. Before we begin:.
E N D
Information Management DIG 3563 – Lecture 11 Compression – Lossy and Lossless J. Michael Moshell University of Central Florida Original image* by Moshell et al . Imagery is fromWikimedia except where marked with *. Licensing is listed.
Before we begin: • Review: binary & hexadecimal numbers • 132 (decimal) means • 1 - $100 bill (102) 100 • 3 - $10 bills (101) 30 • 2 - $1 bill (100) 2 • for a total value of 132 (decimal, or base-10)
Binary numbers at the • of computing • 1101 (binary) means • decimal value • 1 - $8 bill (23) 8 • 1 - $4 bill (22) 4 • 0 - $2 bill (21) 0 • 1 - $1 bill (20) 1 • for a total value of 13 (decimal, or base-10)
Your task: convert bin to dec • or decimal to binary, by this method • 1101 (binary) means • decimal value • 1 - $8 bill (23) 8 • 1 - $4 bill (22) 4 • 0 - $2 bill (21) 0 • 1 - $1 bill (20) 1 • for a total value of 13 (decimal, or base-10)
Practice problems (do it!) • Convert these binary numbers to decimal: • 1000 1001 • 1101 0011 • Convert these decimal numbers to binary: • 31 • 17 • 129
Whoa ... decimal to binary? An example: x= 15 decimal is ... what in binary? (call it y) You need to know powers of 2, up to 210, like this Now: is x>=16? No, so y= 0 is x>=8? Yes, so y= 01 and x=x-8 now x=7. Is x>=4? Yes, so y=011, x=x-4 now x=3. Is x>=2? Yes, so y=0111, x=x-2 now x=1. Is x>=1? Yes, so y=0111
Binary is .... messy! So we ‘encode’ it with convenient symbols so we would represent 1000 1011 as 8B in “Hexadecimal” (base 16) and convert like this: 8 x 16 = 12810 B (or 1110) = 1110 total = 13910
Practice problems 2 • Convert these binary numbers to hexadecimal: • 1000 1001 • 1101 0011 • Convert these decimal numbers to hexadecimal: • 31 • 17 • 129
Practice problems 2 • Convert these hex numbers to decimal: • AA16 • 1216 -(sometimes we write 12x for heXadecimal) • A convention: 8 bits is a byte; • represented as 2 hex digits, between 01x and FFx =25510
And now: compression • We begin with TV, the first BIG data • Media evolved through these steps • 35 mm film played • by a film chain to produce • a video signal • 2 inch reel-to-reel • videotape www.wikipedia.org - GNU FDL
Media Asset Management in TV • (These are mostly analog media) • - cassettes of various sizes: 3 / 4 inch dominated • analog media: • - feasible in 1970’s • - no high RAM needs • - no high speed CPU • BUT: • Limited generation copying www.wikipedia.org - GNU FDL
Media Asset Management in TV • - The arrival of digital videotape- • - Digital Video • professional: DVCAM-L • professional: DVCPRO-M • consumer: miniDV www.wikipedia.org - GNU FDL
Media Asset Management in TV • - The arrival of digital videotape- • - Digital Video • professional: DVCAM-L • professional: DVCPRO-M • consumer: miniDV • Better than analog, but not • Totally loss-less. (Why?) www.wikipedia.org - GNU FDL
Media Asset Management in TV • - The arrival of digital videotape- • The problem of realtime capture and delivery: • Error detecting and correcting codes require • “repair” if errors are found. • If too much data is missing from a tape, • repair may be impossible or take too long. • So: Digital Video uses “masking” & “fill-in”.
Media Asset Management in TV • - The arrival of true video with RAID • RAID: Redundant Array of Inexpensive Disks • * Designed for no-single-point-of-failure. • * Many strategies. Here’s a simple one. • * Odd Parity: • 8 data bits 1 0 1 0 1 1 0 0
Media Asset Management in TV • - The arrival of true video with RAID • RAID: Redundant Array of Inexpensive Disks • * Designed for no-single-point-of-failure. • * Many strategies. Here’s a simple one. • * Odd Parity: • 8 data bits 1 0 1 0 1 1 0 0 1 • plus 1 parity bit - - - - - - - - - - - - - - - - -/ • added so that the WHOLE ROW has • an odd number of 1s in it.
Media Asset Management in TV • - How error correcting codes work 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 Longitudinal Parity: 1 0 0 1 1 1 1 0 1
Media Asset Management in TV • - How error correcting codes work 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 Longitudinal Parity: 1 0 0 1 1 1 1 0 1 Now: A bit goes bad. Can you find it? Can you fix it?
Media Asset Management in TV • - How error correcting codes work 1 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 Longitudinal Parity: 1 0 0 1 1 1 1 0 1 Now: A bit goes bad. Can you find it? Can you fix it? Yes ... we can find its row and its column.
Media Asset Management in TV • - Practice: Find the bad bit p q r s t u v w x a 1 1 1 0 1 0 0 0 1 b 0 1 0 0 1 0 0 0 1 c 1 1 0 0 1 1 1 1 0 d=Parity: 1 0 0 1 0 0 1 0 1
Media Asset Management in TV • - Practice: Find the bad bit p q r s t u v w x a 1 1 1 0 1 0 0 0 1 b 0 1 0 0 1 0 0 0 1 c 1 1 0 0 1 1 1 1 0 d=Parity: 1 0 0 1 0 0 1 0 1
Media Asset Management in TV • - Practice: Find the bad bit p q r s t u v w x a 1 1 1 0 1 0 0 0 1 b 0 1 0 0 1 0 0 0 1 c 1 1 0 0 1 1 0 1 0 d=Parity: 1 0 0 1 0 0 1 0 1 And so we correct it to 0.
Media Asset Management in TV • RAID Technology … is a BIT (well, a little) more complex than that. But you get the idea, I hope.
Media Asset Management in TV • RAID Technology … is a BIT (well, a little) more complex than that. But you get the idea, I hope. Imagine NINE disks to store EIGHT bits of info. If we put one bit of each "word" on 8 disks and the parity checksum on #9, then if any ONE disk fails, we could reconstruct its contents.
Media Asset Management in TV • RAID Technology … is a BIT (well, a little) more complex than that. But you get the idea, I hope. Imagine NINE disks to store EIGHT bits of info. If we put one bit of each "word" on 8 disks and the parity checksum on #9, then if any ONE disk fails, we could reconstruct its contents.
Media Asset Management in TV • RAID Technology … is a BIT (well, a little) more complex than that. But you get the idea, I hope. Imagine NINE disks to store EIGHT bits of info. If we put one bit of each "word" on 8 disks and the parity checksum on #9, then if any ONE disk fails, we could reconstruct its contents. X
Media Asset Management in TV • RAID Technology … is a BIT (well, a little) more complex than that. But you get the idea, I hope. Imagine NINE disks to store EIGHT bits of info. If we put one bit of each "word" on 8 disks and the parity checksum on #9, then if any ONE disk fails, we could reconstruct its contents. X
Media Asset Management in TV • RAID Technology … is a BIT (well, a little) more complex than that. But you get the idea, I hope. Imagine NINE disks to store EIGHT bits of info. If we put one bit of each "word" on 8 disks and the parity checksum on #9, then if any ONE disk fails, we could reconstruct its contents. X
Data Compression Key concepts: Lossless compression: perfectly reversible Essential for computer programs, forensics, etc. Lossy compression: some info is lost, but much space is saved. Used when the resulting product is "good enough" for human consumption. And remember: the original may be archived, too.
Data Compression • Data can be compressed if it contains redundency. • What is that? • An image contains redundency if a given pixel's color allows you to predict the color of the adjacent ones with greater than random chance of success. • The first compression method we study is called • Run Length Encoding.
Run Length Encoding (RLE) A run is a sequence of identical values. 4 4 4 4 4 is a 'run' of five values, all of which = 4 A code block is two numbers, like 5 4. RLE uses code blocks to represent data-with-runs. DATA: 5 5 5 5 5 5 2 2 2 7 7 7 7 7 7 7 7 - total of 17 numbers Coded: 6 5 3 2 8 7 - total of 6 numbers
Run Length Encoding (RLE) You practice: write down the RLE code for this sequence: 4 4 4 21 21 21 21 21 18 18
Run Length Encoding (RLE) You practice: write down the RLE code for this sequence: 4 4 4 21 21 21 21 21 18 18 And ... the answer is ... 3 4 5 21 2 18
Run Length Encoding (RLE) You practice: Decode this code message Message: 5 4 3 2 6 47
Run Length Encoding (RLE) You practice: Decode this code message Message: 5 4 3 2 6 47 And ... the answer is ... 4 4 4 4 4 2 2 2 47 47 47 47 47 47
Run Length Encoding (RLE) The question of memory use. If 'number' means 'byte', then 0<=n<=255. So, code blocks represent runs of length <=255, and you need 3 bytes per color, if 24 bit color. You can either jointly or separately RLE the RGB parts Code blocks would be 4 bytes long: n RGB if joint. If you do all the R then all the G then all the B, your code blocks are 2 bytes long * 3 sets of 'em.
Run Length Encoding (RLE) The question of efficiency. Are there any images that CANNOT be RLE compressed? Sure! Those whose average run length is less than 2. Data: 1 3 4 5 5 5 5 7 6 = nine bytes "Compressed": 1 1 1 3 1 4 4 5 1 7 1 6 = twelve bytes
Run Length Encoding (RLE) The question of efficiency. Are there any images that CANNOT be RLE compressed? Sure! Those whose average run length is less than 2. Data: 1 3 4 5 5 5 5 7 6 = nine bytes "Compressed": 1 1 1 3 1 4 4 5 1 7 1 6 = twelve bytes ?? What would such a 'hard to compress' image look like??
Run Length Encoding (RLE) The question of efficiency. Are there any images that CANNOT be RLE compressed? Sure! Those whose average run length is less than 2. Data: 1 3 4 5 5 5 5 7 6 = nine bytes "Compressed": 1 1 1 3 1 4 4 5 1 7 1 6 = twelve bytes ?? What would such a 'hard to compress' image look like?? Answers : (a) NATURAL images, full of gradients (b) NOISY or RANDOM images
Run Length Encoding (RLE) So, what would you do about such bumpy data? You COULD make up a special message, with code 0, like this: 0 6 32 43 56 77 33 22 Where '0' means "Random block coming" and '6' means "the next six numbers are as-is" So the output would be 32 43 56 77 33 22 This beats straight RLE, but is still not super efficient.
Huffman Coding Idea: most frequent messages get short code symbols. Early example: Morse Code Typesetter's sequence: e t a o i n s h r d l u <-High frequency letters e . t - a .- o - - - i .. n -.
Huffman Coding Idea: most frequent messages get short code symbols. Early example: Morse Code Typesetter's sequence: e t a o i n s h r d l u <-High frequency letters e . t - a .- o - - - i .. n -. Low frequency letters Q - - . - Y - . - - Z - - . .
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 3 0 0 2 0 0 0 0 0 0 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 3 0 0 3 0 0 0 0 0 0 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 3 0 0 3 0 0 0 0 0 1 0 0
Huffman Coding So, to Huffman code a sequence of symbols, first you compute their Frequency Histogram. 5 8 5 5 8 8 14 16 8 5 etc... You continue this until you know pretty well, how frequent all the symbols are. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 3 0 0 3 0 0 0 0 0 1 0 0