100 likes | 267 Vues
Lempel ZIV Compression. LZ is a compression that realizes compression ratios of up to 20 to 1. It relies on the fact that, in any document, character strings are going to be repeated. For example: in legal documents such as contracts, one is likely to find
E N D
Lempel ZIV Compression LZ is a compression that realizes compression ratios of up to 20 to 1. It relies on the fact that, in any document, character strings are going to be repeated. For example: in legal documents such as contracts, one is likely to find phrases such as: “whereas the party of the first part”, repeated many times in the document. Would it not be nice if we could, rather than sending the thirty five individual characters contained in the above phrase, simply send a single integer, such as “18” an have the receiver understand that “18” stands for the above phrase?
Lempel-ZIV Compression Lempel-ZIV provides an elegant algorithm for accomplishing this. The sender has the original message and a previously agreed upon symbol table, usually the set of allowable characters in the alphabet. The receiving party knows nothing to the message content, but it knows what the contents and organization of the symbol table are.
Lempel-ZIV Compression Let us suppose, at the senders end, we wish to send the message: ABABAAABBCACABABACAC The sender would have the following symbol table, assuming that all possible messages consist only of patterns of the characters: A B and C. Beginning Symbol Table: 0 A • B • C • The receiver, knowning that all messages are composed only of the characters A,B, and C, • would have a similar symbol table at the beginning: • 0 A • B • C
Lempel-ZIV Compression At the sending end, the sender will keep track of the following information: The goal is to build an expanded symbol table containing all of the character patterns encountered so far. One pass through the algorithm is the processing of a new character in the message, the sender tracks the following info: Pass Buffer Current What is sent What is stored New buffer Content char in table content 1 A B 0 (code for A) AB (code = 3) B The algorithm begins by sending the first character, the first pass thru the loop begins by reading the second character “B” The sender’s symbol table would now look as follows: 0 A • B • C • AB
Lempel-ZIV Compression At the other end of the transmission, the receiver is trying to reconstruct the symbol table that the sender is building. The receiver is gathering the following info: Pass Prior Current Is Current C Tempstring/ What is Printed (string) (string) Code in Table? 1st Code Pair curr or temp? • 0 (A) 1 (B) Yes B AB/3 B (current) Since the receiver has received the code for both A and B sequentially, he knows the sender has seen the character pattern AB and stores this as entry 3 in his table Receiver’s table after pass one. 0 A • B • C • AB
Lempel-ZIV Compression This process continues for the entire Message: ABABAAABBCACABABACAC Sender Pass Buffer Current What is sent What is stored New buffer Content char in table content • A B 0 (code for A) AB (code = 3) B • B A 1(code for B) BA (code = 4) A • A B -------------- --------------- AB • AB A 3 (code for AB) ABA(code=5) A • A B ___________ ________ AB • AB C 3(code for AB) ABC(code =6) C • C B 2(code for C) CB(code = 7) B • B A ________ _________ BA • BA B 4 (code for BA) BAB (code = 8) B • B A ________ _________ BA • BA B _______ ________ BAB • BAB A 8(code for BAB) BABA(code=9) A Pass Prior Current Is Current C Tempstring/ What is Printed (string) (string) Code in Table? 1st Code Pair curr or temp? • 0 (A) 1 (B) Yes B AB/3 B (current • 1(B) 3(AB) Yes A BA/4 AB(current) • 3(AB) 3(AB) Yes A ABA/5 AB(current) • 3(AB) 2(C) Yes C ABC/6 C(current) • 2 ( C ) 4(BA) Yes B CB/7 BA(current) • 4(BA) 8 No B BAB/8 BAB(temp)
Lempel-ZIV Compression At this point the sender and receiver symbol tables would contain: Sender Receiver 0 A A 1 B B 2 C C 3 AB AB 4 BA BA 5 ABA ABA 6 ABC ABC 7 CB CB 8 BAB BAB 9 BABA not yet