INFORMATION THEORY

INFORMATION THEORY Pui-chor Wong

Introduction • Information theory:- Deals with the amount, encoding, transmission and decoding of information. It deals with the measurement of information but not the meaning of information. • The motivation for information theory is provided by Shannon’s coding theorem. • Shannon’s coding theorem: if a source has an information rate less than the channel capacity, there exists a coding procedure such that the source output can be transmitted over the channel with an arbitrarily small probability of error.

SIGNALLING SYSTEM • Input signals • Encoded Input • Channel Transmitted Information ( A Noise source will also be an input to this stage) • Encoded Output • Output Signals

Information Definition. • If a message e has probability pe, its information I is given by: I = LOGB(1/pe)

Example Determine the information associated with an input set consisting of the 26 letters of the alphabet. Assume each alphabet is sent with the same probability.

Solution: • The probability of each letter is 1/26. • Hence Information I = log2(26) = 4.7 bits

Entropy • Entropy of an input symbol set (denoted by H(X)), is defined to be the average information of all symbols in the input and is measured in bits/symbol. • This is a useful property in practical communication systems, since long sequences are usually transmitted from an information source. • The average information can be obtained by weighting each information value (I(xi)) by the portion of time in which it persists (i.e it’s probability p(xi)) and adding all the partial products p(xi)I(xi) for each symbol.

Formula

Comments on Entropy • For an input set of N symbols, it can be shown that the entropy H(X) satisfies the relation: 0  H(x)  log2(N) • H(X) is the average amount of information necessary to specify which symbol has been generated.

Matlab code function my_entropy() % Plot of entropy with probability for a 2 symbol set. p = 0:0.01:1-eps; H = p.* log2(1./p)+(1-p).*log2(1./(1-p)); clf figure(1) plot(p,H) Title('Entropy variation with probability') xlabel('Probability') ylabel('Entropy') grid on

Information rate In order to specify the characteristics of a communication channel, one design criteria is the knowledge of how fast information is generated. This design criteria is referred to as the information rate and is measured in bits per second. If a source X emits symbols at a rate r symbols (messages) per second, then the information rate is defined to be R = rH(X) = average number of bits of information per second.

Example An analog signal of bandwidth B is sampled at the Nyquist rate (i.e. 2B). Assuming the resulting sequence are quantized into 4 levels; Q1, Q2, Q3 and Q4 with probabilities, p1 = p4 = 1/8 and p2 = p3 = 3/8.

Solution

continued.. The information rate R is: R = rH(X) = 2B(1.8) = 3.6B bits/s As indicated earlier, maximum entropy will occur when each symbol is transmitted with equal probability. Hence if p = ¼

continued..

Source Coding • Purpose: • To minimize the average bit rate required to represent the source by reducing the redundancy of the information source. Or alternately, increasing the efficiency of transmission by reducing the bandwidth requirement.

Code length • If the binary code assigned to symbol xi by the encoder has a length ni ( measured in bits), the average code length L per source symbol is given by:

Code Efficiency • The code efficiency is defined as: where Lmin is the minimum possible value of L. When  approaches 1, the code is said to be efficient.

Code Redundancy  • The code redundancy  is defined as:

Source coding theorem • This states that for a DMS with entropy H(X), the average code word length L per symbol is bounded as: L  H(X) • With Lmin written as H(X), the code efficiency can be written as:

Channel Capacity • Hartley Law • Cc = Capacity in bps • B = Channel Bandwidth in Hz • M = The number of level for each signaling element.

...With Noise • The number of signaling levels M, is related to the signal-to-noise ratio as follows: • The channel capacity can now be expressed as:

Channel Coding • Purpose: • To design codes for the reliable transmission of digital information over noisy channels. • The task of source coding is to represent the source information with the minimum of symbols. When a code is transmitted over a channel in the presence of noise, errors will occur. The task of channel coding is to represent the source information in a manner that minimizes the error probability in decoding. • It is apparent that channel coding requires the use of redundancy. If all possible outputs of the channel correspond uniquely to a source input, there is no possibility of detecting errors in the transmission. To detect, and possibly correct errors, the channel code sequence must be longer than the source sequence. The rate R of a channel code is the average ratio of the source sequence length to the channel code length. Thus, R < 1.

xi Code 1 Code 2 Code 3 Code 4 Code 5 Code 6 X1 00 00 0 0 0 1 X2 01 01 1 10 01 01 X3 00 10 00 110 011 001 X4 11 11 11 111 0111 0001 Code classification

Code classification.. 2 Fixed- length Codes:- Code word length is fixed. Examples Code 1 and Code 2 Variable-length Codes: Code word length is not fixed. All codes except Codes 1 and 2. Distinct Codes:- Each code word is distinguishable from other code words. All except code 1. Note that x1 and x3 are the same. Prefix-Free Codes: No code word can be formed by adding code symbols to another code. Thus in a prefix-free code, no code word is a prefix of another. Codes 2, 4 and 6 are prefix-free. Uniquely decodable codes:- The original source sequence can be reconstructed perfectly from the encoded binary sequence. Code 3 is not uniquely decodable, since a sequence 1001 may correspond to x2x3x2 or x2x1x1x2 . Instantaneous Codes:- A uniquely decodable code is instantaneous if the end of any code word is recognizable without examining subsequenct code symbols.

Huffman encoding Algorithm • 1.Sort source outputs in decreasing order of their probabilities. • 2. Merge the two least-probable outputs into a single output whise probability is the sum of the corresponding probabilities. • 3.If the number of remaining outputs is 2, then go to the next step, otherwise go to step 1. • 4.Arbitrary assign 0 and 1 as code words for the 2 remaining outputs. • 5.If an output is the result of a merger of two outputs in a preceding step, append the current code word with a 0 and a 1 to obtain the code word for the preceding outputs and then repeat step 5. If no output is preceded by another output in a preceding step, then stop.

Example

Error detection coding • The theoretical limitations of coding are placed by the results of information theory. These results are frustrating in that they offer little clue as to how the coding should be performed. Error detection coding is designed to permit the detection of errors. Once detected, the receiver may ask for a re-transmission of the erroneous bits, or it may simply inform the recipient that the transmission was corrupted. In a binary channel, error checking codes are called parity check codes. • Our ability to detect errors depends on the code rate. A low rate has a high detection probability, but a high redundancy. • The receiver will assign to the received codeword the preassigned codeword that minimizes the Hamming distance between the two words. If we wish to identify any pattern of n or less errors, the Hamming distance between the preassigned codewords must be n + 1 or greater.

Single parity check code A very common code is the single parity check code. This code appends to each K data bits an additional bit whose value is taken to make the K + 1 word even (or odd). Such a choice is said to have even (odd) parity. With even (odd) parity, a single bit error will make the received word odd (even). The preassigned code words are always even (odd), and hence are separated by a Hamming distance of 2 or more.

Some Math.. • P{single bit error} = p • P{no error in single bit} = (1-p) • P{no error in 8 bits} = (1-p)8 • P{unseen error in 8 bits} = 1-(1-p)8 = 7.9 x 10-4 Suppose the BER is p = 10-4

continued.. • P{no error in single bit} = (1-p) • P{no error in 9 bits} = (1-p)9 • P{ single error in 9 bits } = 9(P{single bit error} P{no error in other 8 bits}) = 9p(1-p)8 • P{unseen error in 9 bits} = 1-P{no error in 9 bits} -P{single error in 9 bits} ) = 1-(1-p)9 – 9p(1-p)8 = 3.6 x 10-7 The addition of a parity bit has reduced the uncorrected error rate by three orders of magnitude.

Hamming distance& weight • Hamming Distance d(ci,cj) or dij between codewords ci and cj :- Is the number of positions in which ci and cj differ. • Hamming weight w(ci) of a codeword ci:- Is the number of non-zero elements in ci. Equivalent to the Hamming Distance between x and 0 ( 0 being the sequence with all zeros).

Example • Compute the Hamming distance between the 2 code words, 101101 and 001100

Detection & Correction • Error Detection:- It can be shown that to detect n bit errors, a coding scheme requires the use of codewords with a Hamming distance of at least n + 1. • Error Correction:- It can also be shown that to correct n bit errors requires a coding scheme with at least a Hamming distance of 2n + 1 between the codewords.

Example • A code consists of 8 codewords: 0001011 1110000 1000110 1111011 0110110 1001101 0111101 0000000 • If 1101011 is received, what is the decoded codeword?

Solution • The decoded codeword is the codeword closest in Hamming distance to 1101011. • Hence the decoded codeword is 1111011

Linear block codes • An (n,k) block code is completely defined by M = 2k binary sequences each of fixed length n. • Each of the sequences of fixed length n is referred to as a code word. k is the number of information bits. • The Code C thus consists of M code words • C = {c1, c2, c3, c4, …cM}

continued.. • Practical codes are normally block codes. A block code converts a fixed length of k data bits to a fixed length n codeword, where n > k. The code rate Rc is: • and the redundancy of the code is

operations • Arithmetic operations involve addition and multiplication. These are performed according to the conventions of the arithmetic field. The elements used for codes are from a finite set generally referred to as a Galois field and denoted by GF(q) ( where q is the number of elements in that field. • Binary codewords use 2 elements (0 and 1) and hence a GF(2) is used. Arithmetic operations are performed mod(2).

Generator and Parity check Matrices • The output of a linear binary block encoder (i.e a codeword ) for an (n,k) linear block code, is a linear combination of a set of k basis vectors, each of length n, denoted by g1, g2, …..gk. • The vectors g1, g2, …..gk are not unique.

continued.. • From linear algebra, the basis vectors, g1, g2, …..gk, can be represented as a matrix G defined as:

Denoting the k bits in Xm as: Xm = {xm1, xm2, xm3, …….cmk } and the n bits in Cm as: Cm = {cm1, cm2, cm3, …….cmn } The code word Cm can be obtained using the generator matrix as: Cm = XmG

Example • Given • Determine the codewords for each of the input messages. • A (5,2) code obtained by mapping the Information sequence { 00, 01, 10, 11} to C = {00000, 01111, 10100, 11011}.

Any generator matrix G of an (n,k) code can be reduced by row operations (and column permutations) to the ‘systematic form’ defined as: Where Ik is a k-by-k identity matrix and P is ak-by-(n-k) binary matrix. The P matrix determines the n-k redundant or parity check bits of the generated code word. For a given information bit sequence, the code word obtained using a systematic form G matrix, has the 1stk bits are identical to the information bit, and the remaining n-k bits are linear combinations of the k information bits. The resulting (n,k) code is a systematic code.

For systematic binary code, A further simplification can be obtained for binary code, that is since -PT = PT It implies that:

parity check matrix The parity check matrixH, will be one which satisfies the following orthogonality principle: cmHT = 0 Where HT is the transpose of H,

Cyclic codes • Definition: • A cyclic code (a left circular shift) is a linear block code with the extra condition that if c is a code word, a cyclic shift of it is also a code word. • If C = {c1, c2, c3, c4, …cn} then C(1) • C(1) = { c2, c3, c4, …cn, c1 } is a cyclic shifted version of C.

Generation of cyclic codes • For any cyclic code (n,k): • The code word polynomial c(p) corresponding to an information/message sequence X(p) = { x1, x2, x3, …….xk } can be obtained from a generator polynomialg(p) as follows:

The generator polynomial g(p) is always a factor ofpn + 1(or pn-1) of degree n-k

Example • A message bit sequence is given by [1010100101]. When the 3-bit parity sequence [111] is appended to message sequence, determine: • a) The code word • b) The polynomial representation of the code word.

INFORMATION THEORY

INFORMATION THEORY

Presentation Transcript

INFORMATION THEORY

Information Theory

Information theory

Information Theory

Information-Processing Theory

Information Theory

Information Theory

Information Theory

Information Theory

Information Communication Theory

Information Theory

Information Theory

Information theory

Information Theory

Information Theory

Information theory

Information Processing Theory

Information theory

Information Theory

Information Theory

Information Theory

Information Theory