Information Theory

Information Theory Nathanael Paul Oct. 09, 2002

Claude Shannon:Father of Information Theory • “Communication Theory of Secrecy Systems” (1949) • Cryptography becomes science • Why is information theory so important in cryptography?

Some Terms • (P,C,K,E,D) • Computational Security • Computational effort required to break cryptosystem • Provable Security • Relative to another, difficult problem • Unconditional Security • Oscar (adversary) can do whatever he wants, as much as he wants

Applying probability to cryptography • Each message p in P has a probability as well as each k in K has a probability • Given a p in P and a k in K, a y in C is uniquely determined. • Given a k in K and a y in C, an x in X is uniquely determined. • Induce a probability on ciphertext space • For the equation below, y is fixed.

Some probability theory… • Probability distribution on X • Joint probability • Conditional probability • Bayes’ Theorem

Probability Distribution of X • p(x) – probability function of X • X takes on a finite # (or countably infinite) of possible values – x • Ex. x is a letter in substitution cipher, where X is plaintext space • P(X=x) = p(x) >= 0 this sum is over all possible values of x

Joint Probability • Let X1 and X2 denote random variables • p(x1,x2) = P(X1 = x1, X2 = x2) • “The probability that X1 will take on the value x1 and X2 will take on the value x2” • If X1 and X2 are independent, then • p(x1,x2) = p(x1) * p(x2)

Conditional Probability • “What is the probability of x given y?” • p(x|y) = p(x,y)/p(y) • If p(X = x|Y = y) = p(X = x), then X and Y are independent.

Bayes’ Theorem • p(x,y) = p(x) * p(y | x) = p(y) * p(x | y)

Perfect Secrecy Defined • A cryptosystem (P,C,K,E,D) has perfect secrecy if “ciphertext yields no information about plaintext”

Perfect Secrecy Defined Suppose a cryptosystem (P,C,K,E,D) has |K| = |C| = |P|. This cryptosystem has P.S. iff the following hold: - Each key chosen is truely random- For each x in P, y in C,  a unique key k  ek(x) = y.

Perfect Secrecy (P.S.)implies|P| <= |K| and |C| <= |K| • Claim: Perfect Secrecy (P.S.)implies|P| <= |K| and |C| <= |K| • pP(x | y) = pP(x) > 0, where y is fixed.Ek(x) = y, for a k in K (k is random) • For each x a k in K Ek(x) = y, since probability pP(x) > 0.

Conclusion about Perfect Secrecy “Key size should be at least as large as message size, and key size should be at least as large as ciphertext size.”

Perfect Secrecy Example • P = C = K = Z26 = {0,1,2,...,24,25} • Ek(x) = x + k mod 26Dk(x) = x – k mod 26 • p(k) = 1/26 and p(x) = any distribution given • note: key must be truely random

Entropy • Want to be able to measure the “uncertainty” or “information” of some random variable X. • Entropy • a measure of information • “How much information or uncertainty is in a cryptosystem?”

Entropy (cont.) • Given: • X, a random variable • finite set of values of X: p1,..., pn Entropy is:

Entropy examples • X: X1, X2P: 1 , 0Entropy = 0, since there is no choice. X1 will happen 100% of the time. H(X) = 0. • X: X1, X2 X1 is more likely than P: ¾ , ¼ X2.H(X) = - (¾ log2(¾) + ¼ log2(¼))

Entropy examples (cont.) • X: X1, X2 ½ ½ H(x) = - (½ log2(½) + ½ log2(½)) = 1 • X: X1, X2, ..., XnP: 1/n, 1/n, ..., 1/nH(x) = - (1/n log2(1/n) * n) = log2(n)

Entropy examples (cont.) • If X is a random variable with n possible values: • H(X) <= log2(n), with equality iff each value has equal probability (i.e. 1/n) • By Jensen’s Inequality, log2(n) provides an upper bound on H(x) • If x is the months of the year:H(x) = log212  3.6 (about 4 bits needed to encode the year)

Unicity Distance • Assume in a given cryptosystem a msg is a string:x1,x2,...,xn where xi is in P (xi is a letter or block) • Encrypting each xi individually with the same key k, yi = Ek(xi), 1 <= i <= n • How many ciphertext blocks, yi’s, do we need to determine k?

Unicity Distance (cont.) • Ciphertext only attack with infinite computing power • Unicity Distance • Smallest # n, for which n ciphertexts (on average) uniquely determine key • One-time pad (infinite)

Defining a language • L: the set of all msgs, for n >= 1. • “the natural language” • p2: (x1,x2) : x1, x2 in P • pn: (x1,x2,...,xn), xi in P, so pn L • each pi inherits a probability distribution from L (digrams, trigrams, ...) • H(pi) makes sense

Entropy and Redundancy of a language What is the entropy of a language? What is the redundancy of a language?

Application of Entropy and Redundancy • 1 <= HL <= 1.5 in english • H(P) = 4.18 • H(P2) = 3.90 • RL = 1 – HL/log226 • about 70%, depends on HL

Unicity in substitution cipher • no = log2|K|/(RL*log2|P|) • |P| = 26|K| = 26! (all permutations) • no = log226!/(0.70 * log226)which is about 26.8 • Which means… on average, if one has 27 letters of ciphertext from a substitution cipher, then you should have enough information to determine the key!

Ending notes... • key equivocation • “How much information is revealed by the ciphertext about the key?” • H(K|C) = H(K) + H(P) – H(C) • Spurious keys • incorrect but possible • So reconsider our question: “Why can’t cryptography and math be separated?”

Information Theory