Information and Coding Theory Introduction

Information and Coding Theory Introduction

Lecture times Rather unfortunately the course is available only for online studies in spring 2019. The course is to be completed remotely, with both course textbooks (for links see "E-studijas" website) and lecture slides (http://susurs.mii.lu.lv/juris/courses/ict2019.html) available online. Information about updates (i.e. notes about the material you are expected to familiarise yourself up to date) and notifications about the new homework assignments etc. will also be regularly mailed from "E-studijas" website to all registered participants. However, please check this website regularly for any updates.

Course materials Vera Pless. Introduction to the Theory of Error-Correcting Codes (3rd edition). The main textbook for the course that completely covers all the course material related to error correcting codes. The book also covers all the related related/background topics (sometimes without providing proofs) about vector spaces and finite fields. The following chapters of this book will be covered by this course: Chapters 1-5, Chapter 7 and Appendix.

Course materials David J. C. MacKay. Information Theory, Inference and Learning Algorithms (6th edition). The textbook covering course material related to information theory: the notions of information transmission channels and entropy, codes for data compression and their relation to entropy, Shannon's source coding theorem and its relation to entropy and the maximal possible rates of error correction codes. The course, however, will be considerably "less heavy" on probability theory than the book and will require only basic (and/or informal) understanding of the notion of discrete probabilities and the knowledge of Bayes' rule (in particular, we will consider only binary symmetric channels). The following chapters of this book will be partially covered by this course: Chapters 1-2 (introductory notions), Chapters 4-6 (data compression), Chapters 6-10 (Shannon's theorem).

Origins of information theory "The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point" Shannon, C.E. (1948), "A Mathematical Theory of Communication", Bell System Technical Journal, 27, pp. 379–423 & 623–656, July & October, 1948.

Information transmission

Noisy channel

Information theory One of few fields with identifiable beginning: A Mathematical Theory of Communication Bell Systems Technical Journal C.Shannon, 1948 The Mathematical Theory of Communication C.Shannon and W.Weaver, 1949 Claude Elwood Shannon IT courses become very popular in universities, until the subject become too broad. The goodness of term “IT” disputable (communication?). First applications: space communications, military. End of the road? ~ 1971 - lack of suitable hardware ~ 2001 - in some cases we already have achieved theoretical limits

Error correcting codes There is no single “discoverer”, but the first effective codes are due to R.Hamming (around 1950). Some other popular codes: - Golay codes (Voyager spacecrafts, around 1980) • Reed-Solomon codes (CDs, DVDs, DSL, RAID-6, etc) • BCH (Bose & Chaudhuri & Hocquenghem) codes Richard Hamming The course will be more oriented towards ECC than IT (so, expect more algebra and not that much of probability theory :)

Applications of IT and/or ECC

Applications of IT and/or ECC Voyager 1 Launched 05.09.1977 Now 127 AU from Earth Voyager 2 Launched 20.08.1977 Now 103 AU from Earth Error correction: (24,12,8) Golay code Viterbi-decoded convolutional code, rate 1/2, constraint length k=7 Later concatenation with (255,223) Reed-Solomon codes over GF(256) added

Applications of IT and/or ECC CD (1982): (32,28) + (28,24) RS codes CD-ROM (1989): The same as above + (26,24) + (45,43) RS codes DVD (1995): (208,192) + (182,172) RS codes Blue-ray Disc (2006): (248,216) + (62,30) RS codes, LDC (Long-Distance Codes) + BIS(Burst Indicator Sub-codes)+ “picket” (indicate the positions of the most likely burst errors) encoding

Applications of IT and/or ECC Error correction can be drive specific. Initially mostly based on Reed-Solomon codes. From 2009 increased use of LDPC (low density parity-check codes) with performance close to Shannon’s limit.

Applications of IT and/or ECC One of the first modems that employed error correction and reached 9600 bps transfer rate. Introduced in 1971. Priced around “only” $11000.

Applications of IT and/or ECC CDMA (Code Division Multiple Access) modulation largely originated from Shannon’s work. Uses quite specific ECC in several layers (with emphasis on erased bits). For UMTS – Raptor codes (combined LDPC and L:T codes) and Turbo codes..

Applications of IT and/or ECC Similar to RAID 5, however uses two parity bits. One can be computed by simple XOR, but more complicated approach is needed for computation of the second bit. The way it is done is closely related to RS codes.

Information transmission

Noiseless channel

Noiseless channel Are there any non-trivial problems concerning noiseless channels? E.g. how many bits we need to transfer a particular piece of information? All possible n bit messages, each with probability 1/2n Receiver Noiseless channel Obviously n bits will be sufficient. Also, it is not hard to guess that n bits will be necessary to distinguish between all possible messages.

Noiseless channel All possible n bit messages. Msg. Prob. 000000... ½ 111111... ½ other 0 Receiver Noiseless channel n bits will still be sufficient. However, we can do quite nicely with just 1 bit!

Noiseless channel All possible n bit messages, the probability of message i being pi. Receiver Noiseless channel n bits will still be sufficient. If all pi > 0 we also will need n or more bits for some messages, since we need to distinguish all of them. But what is the smallest average number of bits per message we can do with? Derived from the Greek εντροπία "a turning towards" (εν- "in" + τροπή "a turning").

Binary entropy function Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function, Hb(p). The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss. [Adapted from www.wikipedia.org]

Encoding over noiseless channels The problem. Given set M of messages, message mi with probability pi, find a code (mapping from M to {0,1}*), such that an average number of bits for message transmission is as small as possible (i.e. code that minimizes W = pi c(mi), where c(mi) is number off bits used for encoding of mi). What we know about this? • it turns out for any code we will have W  E • there are codes that can approach E‘very close’ • some codes we will have a closer look at: Huffman codes, Shannon codes, arithmetic codes

Noisy channel In practice channels are always noisy (sometimes this could be ignored). There are several types of noisy channels one can consider. We will restrict attention to binary symmetric channels.

Noisy channel Some other types of noisy channels. Binary erasure channel

Noisy channel

Noisy channel - the problem Assume BSC with probability of transmission error p. In this case we assume that we have already decided on the optimal string of bits for transmission - i.e. each bit could have value 1 or 0 with equal probabilities ½. We want to maximize our chances to receive a message without errors, to do this we are allowed to modify the message that we have to transmit. Usually we will assume that message is composed of blocks of m bits each, and we are allowed to replace a given m bit block with an n bit block of our choice (likely we should have n  m :) Such replacement procedure we will call a block code. We also would like to maximize the ratio m/n (code rate).

Noisy channel - the problem If p > 0, could we guarantee that message will be received without errors? With probability  pn any number of bits within each block could be corrupted... If we transmit just unmodified block of m bits, the probability of error is 1(1p)m. Can we reduce this? Repetition code: Replace each bit with 3 bits of the same value (0000, 1111). We will have n = 3m and probability or error 1((1p)3 +3p(1p)2)m= 1(13p2 +2p3)m. Note that 1p < 13p2 +2p3, if 0 < p < ½.

Repetition code R3 Probability of error of transmission of single bit using no coding and R3.

Repetition codes Rn R3 - the probability of unrecoverable error is 3p2  2p3 For RN we have: Can we design something better than repetition codes?

Hamming code [7,4] G - generator matrix A (4 bit) message x is encoded as xG, i.e. if x = 0110 then c = xG = 0110011. Decoding? - there are 16 codewords, if there are no errors, we can just find the right one... - also we can note that the first 4 digits of c is the same as x :)

Hamming code [7,4] What to do, if there are errors? - we assume that the number of errors is as small as possible - i.e. we can find the code word c (and the corresponding x) that is closest to received vector y (using Hamming distance) • consider vectors a = 0001111, b = 0110011 and c = 1010101, • - if y is received, compute ya, yb and yc (inner products), e.g., for y = 1010010 we obtain ya = 1, yb = 0 and yc = 0. -- this represents a binary number (100 or 4 in example above) and we conclude that error is in 4th digit, i.e. x = 1011010. Easy, bet why this method work?

Hamming code [7,4] No errors - all pi-s correspond to di-s Error in d1,...,d3– a pair of wrong pi-s Error in d4 – all of pi-s are wrong Error in pi– this will differ from error in a single di Parity bits of H(7,4) So: • we can correct any single error • since this is unambiguous, we should be able to detect any 2 errors

Hamming code [7,4] a = 0001111, b = 0110011 and c = 1010101 H - parity check matrix Why it does work? We can check that without errors yHT= 000 and that with 1 error yHTgives the index of damaged bit... General case: there always exists matrix for checking orthogonality yHT= 0. Finding of damaged bits however isn’t that simple.

Block codes • the aim: for given k and n correct as many errors as possible • if minimal distance between codewords is d, we will be able to correct up to t = d1 /2 errors. • in principle we can chose any set of codewords, but it is easier to work with linear codes • decoding still could be a problem • even more restricted and more convenient are class of cyclic codes

Some more complex approaches - we have formulated the lossless communication problem in terms of correction of maximal number of bits in each block of [n,k] code and will study the methods for constructing and analyzing such codes • errors quite often occur in bursts... • it is possible to “spread out” individual blocks (interleaving) • it turns out that better work methods that just try to minimize transmission errors (without guarantees regarding number of bits) • there are recently developed methods/resources that allows to use such codes efficiently in practice and they are close to “optimal” -- low-density parity-check codes (LDPC) -- turbo codes

Limits of noisy channels Given [n,k] code, we define rate of the code as R = k/n. The aim is to get R as large as possible for a given error correction capacity. Assume BSC with error rate p. Apparently there should be some limits how large the value of R could be achieved. A bit more about entropy. Conditional entropy Mutual information Binary entropy function

Limits of noisy channels A bit more about entropy. Relations between entropies, conditional entropies, joint entropy and mutual information. [Adapted from D.MacKay]

Limits of noisy channels Given [n,k] code, we define rate of the code as R = k/n. The aim is to get R as large as possible for a given error correction capacity. Assume BSC with error rate p. Apparently there should be some limits how large the value of R could be achieved.

Channel capacity For BSC there is just a “fixed distribution” defined by p.

Shannon Channel Coding Theorem Shannon’s original proof just shows that such codes exist. With LDPC and turbo codes it is actually possible to approach Shannon’s limit as close as we wish.

Shannon Channel Coding Theorem pB – the probability that the whole block (length = k) will be transmitted with errors pb– the perceived probability that any specific bit from the encoded-transmitted-decoded block will be received with error Region 1 – arbitrary small errors are achievable with rates R < C Region 2 – errors down to pb are achievable with rates R < R(pb) Region 3 – errors close to pb or smaller are non-achievable

Shannon Channel Coding Theorem

Some recent codes Convolutional codes (1995) Turbo codes (1993) - interleaving (try to combine several reasonably good codes) - feedback (decoding of the next row depends from errors in previous ones) LDPC (Low Density Parity Check) codes (1963, “rediscovered” in 1996 :)

Main topics covered by the course Transmission over noiseless channels (data compression) • Notion of entropy, its relation to data compression • Optimal compression codes (Huffman code) • Heuristic compression methods (Lempel-Ziv code) Transmission over noisy channels (transmission error correction) • Notion of entropy, its relation to channel capacity and theoretical possibilities for error correction (Shannon information theory) • Practical methods for transmission error correction (block error correction codes) • As little “reminding” of finite field and linear algebra as will be needed to discuss this topic :) • Definition and basic properties of block error correcting codes • Hamming codes correcting single errors • Multiple error correction – BCH and Reed-Solomon codes • Some applications of error correction (e.g. error correction on CDs)

Requirements 5 homeworks – 80% of grade 5 homeworks will be given during the course. In total all the homeworks will be worth 80% of the grade. There are no strict deadlines, however, at least half of the homeworks (i.e. 2 homeworks from 5) must be submitted before the exam session starts. The preferred way of submission of your solutions is sending your solutions to me by email. Alternatively you can hand in them in paper format either to the office of Master study program (room 411, IMCS UL, Rainis boulevard 29) or to my colleague KarlisCerans in my office (room 421, IMCS UL, Rainis boulevard 29).

Requirements Exam – 20% of grade The exam will be given in written form and will consist of practical exercises and, probably, some theoretical questions from the subject areas covered by the course. Exam will be of take home and open book type - i.e. you are allowed to use whatever (visual) materials you have whilst you are preparing the answer (however any communication with other persons is strictly prohibited) and you will have approximately 3 days time limit for preparing your answers. The dates when you take your exam can be agreed individually. Apart from 2 homework requirement before the start of the exam session submission of other homeworks and/or taking the exam is optional. However, you need to earn at least 35% for a successful completion of the course.

Academic honesty You are expected to submit only your own work! Sanctions: Receiving a zero on the assignment (in no circumstances a resubmission will be allowed) No admission to the exam and no grade for the course

Textbooks The main textbook for the course that completely covers all the course material related to error correcting codes. The book also covers all the related related/background topics (sometimes without providing proofs) about vector spaces and finite fields. The following chapters of this book will be covered by this course: Chapters 1-5, Chapter 7 and Appendix. Vera Pless Introduction to the Theory of Error-Correcting Codes Wiley-Interscience, 1998 (3rd ed) Course textbook

Textbooks W.Cary Huffman, Vera Pless Fundamentals of Error-Correcting Codes Cambridge University Press, 2003

Information and Coding Theory Introduction