Discrete Math CS 2800

Discrete MathCS 2800 Prof. Bart Selman selman@cs.cornell.edu Module Probability --- Part c) 1) Linearity of Expectation 2) St. Petersburg (Gambling) Paradox 3) Randomized Algorithms 4) Variance

Linearity of Expectation

Linearity of Expectation • Thm. Very useful result. Holds even when X_i’s are dependent! Proof: (case n=2)

Proof: (case n=2) (defn. of expectation; summing over elementary events in sample space) QED Aside, alternative defn. of expectation:

Example application • Consider n coin flips of biased coin (prob. p of heads), what is the expected number of heads? • Let X_i be a 0/1 r.v such that X_i is 1 iff ith coin flip comes • up head. (X_i is called an “indicator variable”.) • So, we want to know: Linearity of Expectation! What is E(X_i) ?  So, QED Holds even if coins are not flipped independently! Consider: all coins “glued” together. Either all “heads” or all “tails”. Still correct expectation! Why? (can be quite counter-intuitive)

Example • Consider n children of different heights placed in a line at random. • Starting from the beginning of the line, select the first child. Continue walking, until you find a taller child or reach the end of the line. • When you encounter taller child, also select him/her, and continue to look for next tallest child or reach end of line. • Question: What is the expected value of the number of children selected from the line?? Hmm. Looks tricky… What would you guess? Lineup of 100 kids... [e.g. 15 more or less?] Lineup of 1,000 kids… [e.g. 25 more or less?]

Let X be the r.v. denoting the number of children selected from the line. where 1 if the tallest among the first i children. (i.e. will be selected from the line) 0 otherwise By linearity of expectation, we have:

Note that the one tallest person among i persons needs to be at the end. (by symmetry: equally likely in any position) What is P(X_1 = 1)? A: 1 A: 1/n What is P(X_n = 1)? What is P(X_i = 1)? A: 1/i • What is E(X_i)? Now, E(X_i) = 0 * P(X_i = 0) + 1 * P(X_i = 1) = 0 * (1 – 1/i) + 1 * (1/i) = 1/i. Consider doubling queue: What’s probablility tallest kid in first half? So, E(X) = 1 + 1/2 + 1/3 + 1/4 … + 1/n Intriguing! Surprisingly small!! Intuition?? e.g. N = 100 E(X) ~ 5 N = 200 E(X) ~ 6 N = 1000 E(X) ~ 7 N = 1,000,000 E(X) ~ 14 N = 1,000,000,000 E(X) ~ 21 A: ½ (in which case 2nd half doesn’t add anything!)

Indicator Random Variable • Recall: Linearity of Expectation • Y An indicator random variable is: • 0/1 binary random variable. • 1 corresponds to an event E, and 0 corresponds to the event did not occur • Then E(Y) = 1*P(E) + 0*(1-P(E)) = P(E) • Expected number of times event event occurs: • E(X1+X2+…) = E(X1)+E(X2)+…. = P(E1)+P(E2)+….

E[Xi] = Pr(Xi = 1) Suppose everyone (n) puts their cell phone in a pile in the middle of the room, and I return them randomly. What is the expected number of students who receive their own phone back? Guess?? Define for i = 1, … n, a random variable: Need to calculate: 1-(1/n) 1/n Why? Symmetry! All phones equally likely.

E[X] = E[X1 + X2 + … + Xn] = E[X1] + E[X2] + … + E[Xn] = 1/n + 1/n + … + 1/n = 1 So, So, we expect just one student to get his or her own cell phone back… Independent of n!!

E[X] = E[X1 + X2 + … + Xn] = E[X1] + E[X2] + … + E[Xn] So, what do we know about Xi? Suppose there are N couples at a party, and suppose m (random) people get “sleepy and leave”… What is the expected number of (“complete”) couples left? Define for i = 1, … N, a random variable: Define r.v. X = X1 + X2 + … + Xn, and we want E[X].

E[X1] + E[X2] + … + E[Xn] = n x E[X1] = (2N-m)(2N-m-1)/2(2N-1) Suppose there are N couples at a party, and suppose m people get sleepy and leave. What is the expected number of couples left? Define for i = 1, … N, a random variable: How? (# of ways of choosing m from everyone else) /(# of ways of choosing m from all) E[Xi] = Pr(Xi = 1)

Probability Paradox II

The St. Petersburg (Gambling) Paradox • Consider the following game: • A coin is flipped until a head appears. • The payoff is based on the number of tails observed (n) before the first head. • The payoff is calculated as $2n • How much are you willing to pay to play this game? • To answer this: you want to know the expected payoff. • [General idea: If expected payoff is e.g. $100, accept to play • the game by paying e.g. $95. Expect to win $5 per game on average • over time!] Note: This grows pretty fast!

Aside: Fair Bets / Games • A fair bet is a lottery in which the expected payoff is equal to the cost of playing. • Most people will not take a fair bet unless the dollar amount involved is reasonable (you could lose!).

St. Petersburg Paradox Expected payoff infinity... Pretty good! Hmm. Really??

St. Petersburg Paradox (cont’d) Idea: “utility” of money goes with SQRT. • The expected payoff is infinite!! • But, how much would you be willing to play the game? • Most people would only pay a couple of dollars • Note that the marginal (extra) utility for each additional is just $0.50 What would be a reasonable entrance fee in ”real life”? Bernoulli proposed to replace the expectation E[G] of the profit G = 2n with the expectation (E[√G])2, where u(x) = √x is called a utility function. This would lead to a fair entrance:

It is not so clear if that is the right way out of the paradox because for any • proposed utility function u(x), one can modify the Casino rule so that the • paradox reappears: pay (2k)2 if the utility function u(x) =√x or • pay e2^kdollars, if the utility function is u(x) = log(x). • Discussions about what is the “right utility function” are a key part of • economics and social sciences. But, paradox continues. Why do we not go with the expected value?

St. Petersburg Paradox: Another way out Expected payoff. What is expectation when max payoutis $1M? I.e., after a run of 20 or more Tails, you get “just” $1M. $10 hmm… So, to get to infinity, by far the most contributions come from extremely rare, extremely large payout runs.

St. Petersburg Paradox: Perspective Expectation relies on being able to do many repeated runs. But if even in only a few runs, with a large entry fee, you have a substantial probability of “losing it all,” unless you get long runs right away. The problem of “Gambler’s ruin.” Most folks don’t want to take that chance… The paradox leads to interesting simulations.

Average payout per k games played so far (k <= 4,000). Note sudden jumps. Where do they come? Similar to e.g. stock market  should have stopped here… Assume $10 entrance fee per game. per game Average payout becomes hard to recover What kind of payout is needed? requires 11 tails in a row! After 4000 games you would have lost around $10K. Still, eventually player would win, but that may take a very long time!! Games played

Paradox is related to “Power Law Distributions” Consider positive random variable X >= 1. Let the probability of X > x be given by: where k is a constant > 1. Prob. density function

Unexpected expectations… • The conditional probability that X exceeds a value x, given • that X > x_0, is given by: • The expectation of this conditional variable is given by: • OK. Looks rather “unremarkable” but beware…

We commissioned a project with expected delivery time of 3 days. • Unfortunately, the work is power-law distributed. • We have k = 1.5, because E[X] = k / (k-1) = 3. • So, when project starts, we expect it to be finished in 3 days. • It’s been 5 days. How much longer can we expect to wait? • Hmm. We have and • x_0 = 5, and k = 1.5. • So, expected remaining wait time is now another 15 – 5 = 10 days!! • If not done in another 60 days, we’ll need to wait another 120 days, in expectation! • Not so atypical! Actually, models real-world waiting times for project • completions, without “drop-dead deadlines”. • Underlying intuition: “rare long delays” are actually not so rare! • Or “the longer you have waited for something, the longer you may yet • have to wait” • Summary: Beware of expectations and “heavy tails”!

Randomized Algorithms

Example: MAX 3-SAT • Consider a propositional logical formula on N Boolean variables in conjunctive normal form (CNF), i.e., a conjunction (logical AND) of disjunctions (logical OR). • Example: • The truth assignment with and assigned to True and assigned to False satisfies this formula. Each disjunction is also referred to as a “clause”. • If each clause, contains exactly k variables, we refer to the formula as a k-CNF formula.

MAX 3-SAT cont. • Problem: MAX-3-SAT • Given a 3-CNF formula F, find a truth assignment that satisfies as many clauses as possible. • The MAX 3-SAT problem is a so-called NP-hard problem; it is generally believed that no efficient (i.e., polynomial time) algorithm exists for solving such problems. • [The $1M Clay Millennium prize, click on P=/=NP] • Note: search space of 2N truth assignments. • ` Stephen Cook Leonid Levin

So, finding a maximally satisfying assignment is (most likely) computationally very hard. • However, it’s surprisingly easy to find a reasonable good assignment, satisfying 7/8th (87.5%) of the clauses in expectation. How?? • Thm. Given a 3-CNF formula with k clauses, the expected number of clauses satisfied by a random assignment is . • Proof. (by linearity of expectation) • A random assignment is obtained by setting each variable x1,…, xnindependently to True or False with probability ½ each. • Let Z denote the r.v. equal to the number of satisfied clauses. Z can be written as a sum of random indicator variables Zi, one for each clause. • I.e. with Zi = 1 if the ith clause is satisfied, and 0 otherwise.

Now, we have by linearity of expectation (Remember this holds no matter how the random variables Zi are correlated!) What is ? The probability that a clause is not satisfied is (1/2)3 = 1/8. So, the probability that a clause is satisfied by the random assignment is 1 – 1/8 = 7/8. So, And, therefore: QED

So, we can actually find a pretty good assignment, in expectation, very easily, even though it’s believed intractable to find the maximally satisfying assignment. We can obtain yet another surprise from our analysis. Note that a random variable has to assume a value at least as large as its expectation at some point in the sample space. This observation immediately leads us to the following result. Thm. Given a 3-CNF formula, there must exist a truth assignment that satisfies at least a 7/8th fraction of the clauses. So, from the analysis of a random event (a randomly sampled truth assignment), we have now obtained a statement that does not involve any randomness or probability! (pause…)

The technique we used to prove this result is more generally referred to as the “probabilistic method”, which can be used to show the existence of certain combinatorial objects (in this case, a truth assignment satisfying 7/8th of the clauses) by showing that a random construction produces the desired object with non-zero probability. The probabilistic method (link) is a non-constructive proof technique. The method was pioneered by the famous mathematician Paul Erdos (link). End note: We showed that a randomly generated assignment satisfies 7/8th of the clauses, in expectation. Hmm… How often do we have to guess to be sure to have an assignment satisfying 7/8th of the clauses? It can be shown that the expected number of guesses grows only polynomially in N, the number of Boolean variables.

Variance

Variance • The variance of a random variable is its expected squared deviation from its mean. • A measure of “dispersion” for random variables • Is calculated in the natural way: Each value a random variable could take is subtracted from its expectation, squared, and weighted by its probability of occurrence

Variance • Each value a random variable could take is subtracted from its expectation, squared, and weighted by its probability of occurrence

x x-E(X) (x-E(X))2 P(X=x) Product 1 -2.5 6.25 1/6 1.042 2 -1.5 2.25 1/6 0.375 3 -0.5 0.25 1/6 0.042 4 0.5 0.25 1/6 0.042 5 1.5 2.25 1/6 0.375 6 2.5 6.25 1/6 1.042 Tot 2.918 Variance • Example, die throw E[X]=3.5

Need a measure of spread. Use the average squared difference from the mean. Variance What is difference between the following distributions? E[x]=1

By algebra shown in your book, page 436. Variance Definition: Let X be a r.v. on a sample space S. The variance of X, denoted by Var(X), is: Theorem: If X is a r.v. on a sample space S, then

Variance What is difference between the following distributions? E[x]=1 E[x]=1 E[X2] = (0)(.4)+1(.2)+4(.4) = 1.8 E[X2] = (0)(.1)+1(.8)+4(.1) = 1.2 Var(X) = 1.8 - 1 = 0.8 Var(X) = 1.2 - 1 = 0.2

Independent Random Variables:Definition Random variables X and Y are independent if occurrence of one does not affect other’s probability P(X=x,Y=y) = P(X=x)*P(Y=y) P(X,Y) = P(X)*P(Y) otherwise X and Y dependent

Independent Random Variables Random variables X and Y are independent if occurrence of one does not affect the probability of occurrence of other iff P(X=x|Y=y) = P(X=x) for all x and y i.e. P(X|Y) = P(X) for all values X and Y otherwise dependent

Independent Random Variablesalt. defn. Random variables X and Y are independent if occurrence of one does not affect the probability of occurrence of other iff P(Xx,Yy) = P(Xx)*P(Yy) i.e. F(X,Y) = F(X)*F(Y) otherwise X and Y dependent

Independent Random variables • Theorem: If X and Y are independent random variables on a space S, then E[XY]=E[X]E[y]

Independent Random variables • Theorem: If X and Y are independent random variables on a space S, then V[X+Y]=V[X]+V[y]. Furthermore if Xi, i=1,2,3,…, n a positive integer, are pairwise independent random variables on S, then V[X1+X2+…+Xn]=V[X1]+V[X2]+…+V[Xn]

Variance • Standard deviation: as in the case of describing data, it is easier to work in standard deviations

Functions of a random variable • It is possible to calculate expectations and variances of functions of random variables

Discrete Math CS 2800