8. Limit theorems

8. Limit theorems

Many times we do not need to calculate probabilities exactly An approximate or qualitative estimate often suffices P(magnitude 7+ earthquake within 10 years) = ? This is often a much easier task

What do you think? I toss a coin 1000 times. The probability that I get a streak of 14 consecutive heads is C A B < 10% > 90% ≈ 50%

Consecutive heads Let N be the number of occurrences of 14 consecutive heads in 1000 coin flips. N = I1 + … + I987 where Ii is an indicator r.v. for the event “14 consecutive heads starting at position i” E[Ii ] = P(Ii = 1) = 1/214 ≈ 0.0602 E[N] = 987 ⋅ 1/214 = 987/16384

Markov’s inequality For every non-negative random variable X and every value a: P(X ≥ a) ≤ E[X] / a. E[N] ≈ 0.0602 P[N ≥ 1] ≤E[N] / 1 ≤ 6%.

Proof of Markov’s inequality For every non-negative random variable X: and every value a: P(X ≥ a) ≤ E[X] / a. E[X] = E[X | X ≥ a] P(X ≥ a) + E[X | X <a] P(X< a) ≥ 0 ≥ 0 ≥ a E[X] ≥a P(X ≥ a) + 0.

Hats 1000 people throw their hats in the air. What is the probability at least 100 people get their hat back? Solution N = I1 + … + I1000 where Ii is the indicator for the event that person i gets their hat. Then E[Ii ] = P(Ii = 1) = 1/n E[N] = n 1/n= 1 P[N ≥ 100] ≤ E[N] / 100 = 1%.

Patterns A coin is tossed 1000 times. Give an upper bound on the probability that the pattern HH occurs: (a) at least 500 times (b) at most 100 times

Patterns (a) Let N be the number of occurrences of HH. Last time we calculated E[N] = 999/4 = 249.75. ≈ 49.88% = 249.75/500 P[N ≥ 500] ≤ E[N] / 500 so 500+ HHs occur with probability ≤ 49.88%. (b) P[N ≤ 100] ≤ ? P[N ≤ 100] = P[999 – N≥ 899] ≤ E[999 –N] / 899 = (999 –249.75)/ 899 ≤ 83.34%

Computer simulation of patterns # toss n coins and count number of consecutive head pairs defconsheads(n): count = 0 lastone = randint(0, 1) thisone = randint(0, 1) foriin range(n - 1): iflastone == 1 andthisone == 1: count = count + 1 lastone = thisone thisone = randint(0, 1) returncount >>> foriin range(100): print(consheads(1000), end = “ ”) 264 260 256 263 272 224 256 254 275 231 242 232 247 268 229 270 231 272 241 238 257 239 251 252 255 249 267 223 272 254 219 266 271 265 212 262 239 253 265 254 262 231 271 242 258 255 219 281 238 246 242 263 245 239 270 199 251 229 240 253 282 258 237 276 247 221 242 226 232 244 222 258 255 294 239 267 253 259 236 239 236 243 254 240 232 248 270 252 232 282 248 244 251 223 226 222 288 266 268 236

Chebyshev’s inequality For every random variable X and every t: P(|X – m| ≥ ts) ≤ 1 / t2. where m = E[X], s = √Var[X].

Patterns m= 249.75 E[N] = 999/4 = 249.75 s ≈ 17.66 Var[N] = (5⋅999 – 7)/16 = 311.75 (a) P(X ≥ 500) ≤ P(|X – m|≥ 14.17s) ≈ 0.50% ≤ 1/14.172 (b) P(X ≤ 100) ≤ P(|X – m|≥ 8.47s) ≤ 1/8.472 ≈ 1.39%

Proof of Chebyshev’s inequality For every random variable X and every a: P(|X – m| ≥ ts) ≤ 1 / t2. where m = E[X], s = √Var[X]. P(|X – m| ≥ ts) ≤E[(X – m)2] / t2s2 = 1 / t2. = P((X – m)2≥ t2s2)

An illustration m a Markov’s inequality: P( X ≥ a) ≤ m / a. p.m.f. / p.d.f. of X P(|X – m| ≥ ts) ≤ 1 / t2. 0 m m – ts m + ts Chebyshev’s inequality: s p.m.f. / p.d.f. of X

Polling 7 1 8 3 2 9 4 6 5

Polling i i 1 if Xi = 0 if X1,…, Xn are independentBernoulli(m) where m is the fraction of blue voters • X = X1 + … + Xn X/n is the pollster’s estimate of m

Polling How accurate is the pollster’s estimate X/n? • X = X1 + … + Xn m = E[Xi], s = √Var[Xi] E[X] = • E[X1] + … + E[Xn] • = mn Var[X] • = Var[X1] + … + Var[Xn] = s2n

Polling E[X] = mn • X = X1 + … + Xn Var[X] = s2n • P( |X – mn| ≥ ts√n ) ≤ 1 / t2. en d • P( |X/n – m| ≥ e) ≤ d. confidenceerror samplingerror

The weak law of large numbers X1,…, Xn are independent with same p.m.f. (p.d.f.) m = E[Xi], s = √Var[Xi], • X = X1 + … + Xn For every e, d > 0 and n ≥ s2/(e2d): P(|X/n –m| ≥ e) ≤ d

Polling For e, d > 0 and n ≥ s2/(e2d): P(|X/n –m| ≥ e) ≤ d Say we want confidence error d = 10% and sampling error e= 5%. How many people should we poll? For Bernoulli(m) samples, s2 = m (1 –m) ≤ 1/4 n ≥ s2/(e2d) ≥ 4000s2 This suggests we should poll about 1000 people.

A polling simulation • X1 + … + Xn • X1, …, Xnindependent Bernoulli(1/2) • n pollster’s estimate number of people polled n

A polling simulation • X1 + … + Xn • 20 simulations • n pollster’s estimate number of people polled n

A more precise estimate X1,…, Xn are independent with same p.m.f. (p.d.f.) Let’s assume n is large. Weak law of large numbers: • P( |X – mn| ≥ ts√n ) ≤ 1 / t2. with high probability • X1 + … + Xn≈ mn • this suggests X1 + … + Xn≈ mn + Ts√n

Some experiments • Xiindependent Bernoulli(1/2) • X = X1 + … + Xn • n = 6 • n = 40

Some experiments • Xiindependent Poisson(1) • X = X1 + … + Xn • n = 3 • n = 20

Some experiments • Xiindependent Uniform(0, 1) • X = X1 + … + Xn • n = 2 • n = 10

The normal random variable • f(t) = (2p)-½ e-t/2 • 2 p.d.f. of a normal random variable t

The central limit theorem X1,…, Xn are independent with same p.m.f. (p.d.f.) • m = E[Xi], s = √Var[Xi], X = X1 + … + Xn • lim P(X ≤ mn+ ts√n ) = P(T ≤ t) For every t (positive or negative): n → ∞ where T is a normal random variable.

Polling again Say we want confidence error d = 10% and sampling error e= 5%. How many people should we poll? Probability model • m= fraction that will vote blue • X = X1 + … + Xn • Xiindependent Bernoulli(m) E[Xi] = m, s = √Var[Xi] = √m(1 - m) ≤ ½.

Polling again ts√n = 5% n t= 5%√n/s 5% n • lim P(X ≤ mn – ts√n ) = P(T≤ -t) lim P(X ≥ mn+ ts√n ) = P(T ≥ t) 5% n lim P(X/n is not within 5%ofm) = P(T ≤ -t)+ P(T≥ t) n → ∞ n → ∞ n → ∞ • = 2 P(T ≤ -t)

The c.d.f. of a normal random variable P(T≥t) F(t) P(T≤ -t) t -t t

Polling again • confidence error = 2 P(T ≤ -t) = 2 P(T≤ -5%√n/s) • ≤ 2 P(T ≤ -√n/10) • We want a confidence error of ≤ 10%: • We need to choose n so that P(T ≤ -√n/10) ≤ 5%.

Polling again P(T ≤ -√n/10) ≤ 5% http://stattrek.com/online-calculator/normal.aspx -√n/10 ≈ -1.645 F(t) n≈ 16.452 ≈ 271 t

Party Ten guests arrive independently at a party between 8pm and 9pm. Give an estimate of the probability that the average arrival time of a guest is past 8:40pm.

Acute triangles Drop three points at random on a square. What is the probability that they form an acute triangle?

Simulation Idea: Conduct a poll among random triangles! # indicate whether the triangle with the given vertices is acute defis_acute(x1, y1, x2, y2, x3, y3): defdot(x1, y1, x2, y2, x0, y0): return (x1 - x0) * (x2 - x0) + (y1 - y0) * (y2 - y0) a1 = dot(x2, y2, x3, y3, x1, y1) a2 = dot(x3, y3, x1, y1, x2, y2) a3 = dot(x1, y1, x2, y2, x3, y3) returna1 > 0 and a2 > 0 and a3 > 0 # count the fraction of acute triangles among n random samples defsimulate_triangles(n): count = 0 foriin range(n): ifis_acute(uniform(0.0, 1.0), uniform(0.0, 1.0), uniform( count = count + 1 return1.0 * count / n

Simulation Want sampling errore = .01, confidence errord = .05 1. Rigorous estimate: By weak law of large numbers, we can choose n = s2/(e2d) ≤ 50,000 > simulate_triangles(50000) 0.27326 > simulate_triangles(50000) 0.27392 > simulate_triangles(50000) 0.27612

Simulation Want sampling errore = .01, confidence errord = .05 2. Non-rigorous (but better) estimate: Central limit theorem suggests choosing nsuch that ts√n ≤ en, P(Normal < -t) = d n = (t/2e)2 ≈ 5366 t = 1.465 > simulate_triangles(5366) 0.28158777487886694 > simulate_triangles(5366) 0.27003354453969436 > simulate_triangles(5366) 0.2849422288483041

8. Limit theorems

8. Limit theorems

Presentation Transcript

Approximations to Probability Distributions: Limit Theorems

PART 8 Circle Theorems

Random variables, distributions and limit theorems

Chapter 8. Some Approximations to Probability Distributions: Limit Theorems More Practical Problems

Session 8 Proving Geometric Theorems

Theorems

Circle Theorems

11-8 Theorems Involving Midpoints

Heavy Traffic Limit Theorems for Real-Time Computer Systems

Concepts in Probabilistic Convergence and Theorems on the Limit of iid Mean

Network Theorems

Circle Theorems

The Limit Theorems

8. Limit theorems

Theorems

Heavy Traffic Limit Theorems for Real-Time Computer Systems

Approximations to Probability Distributions: Limit Theorems