1 / 19

Streaming Algorithms

Streaming Algorithms. Piotr Indyk MIT. Data Streams. A data stream is a sequence of data that is too large to be stored in available memory Examples: Network traffic Sensor networks Approximate query optimization and answering in large databases Scientific data streams ..And this talk.

riley-moon
Télécharger la présentation

Streaming Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Streaming Algorithms Piotr Indyk MIT

  2. Data Streams • A data stream is a sequence of data that is too large to be stored in available memory • Examples: • Network traffic • Sensor networks • Approximate query optimization and answering in large databases • Scientific data streams • ..And this talk

  3. Outline • Data stream model, motivations, connections • Streaming problems and techniques • Estimating number of distinct elements • Other quantities: Norms, moments, heavy hitters… • Sparse approximations and compressed sensing

  4. Basic Data Stream Model • Single pass over the data: i1, i2,…,in • Typically, we assume n is known • Bounded storage (typically nor logc n) • Units of storage: bits, words, coordinates or „elements” (e.g., points, nodes/edges) • Fast processing time per element • Randomness OK (in fact, almost always necessary) 8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ...

  5. Connections • External memory algorithms • Linear scan works best • Streaming algorithms yield efficient external memory algorithms • Communication complexity/sketching • Low-space algorithms yield low-communication protocols • Metric embeddings, sub-linear time algorithms, pseudo-randomness, compressive sensing A B

  6. Counting Distinct Elements • Stream elements: numbers from {1...m} • Goal: estimate the number of distinct elements DE in the stream • Up to 1 • With probability 1-P • Simpler goal: for a given T>0, provide an algorithm which, with probability 1-P: • Answers YES, if DE> (1+)T • Answers NO, if DE< (1-)T • Run, in parallel, the algorithm with T=1, 1+, (1+)2,..., n • Total space multiplied by log1+n ≈log(n)/ 

  7. Vector X: 1 2 3 4 5 6 7 8 9 Vector Interpretation • Initially, x=0 • Insertion of i is interpreted as xi = xi +1 • Want to estimate DE(x) = the number of non-zero elements of x Stream: 8 2 1 9 1 9 2 4 4 9 4 2 5 4 2 5 8 5 2 5

  8. Vector X: 1 2 3 4 5 6 7 8 9 (T=4) Set S: + + + Estimating DE(x) (based on[Flajolet-Martin’85]) • Choose a random set S of coordinates • For each i, we have Pr[iS]=1/T • Maintain SumS(x) = iSxi • Estimation algorithmI: • YES, if SumS(x)>0 • NO, if SumS(x)=0 • Analysis: • Pr=Pr[SumS(x)=0] = (1-1/T)DE • For T large enough: (1-1/T)DE≈e-DE/T • Using calculus, for  small enough: • If DE> (1+)T, then Pr ≈ e-(1+) < 1/e - /3 • if DE< (1-)T, then Pr ≈ e-(1-)> 1/e + /3 Pr DE

  9. Estimating DE(x) ctd. • We have an algorithm: • If DE> (1+)T, then Pr<1/e-/3 • if DE< (1-)T, then Pr>1/e+/3 • Need to amplify the probability of correctness: • Select sets S1 … Sk, k=O(log(1/P)/2) • Let Z = number of SumSj(x) that are equal to 0 • By Chernoff bound with probability >1-P • If DE> (1+)T, then Z<k/e • if DE< (1-)T, then Z>k/e • Total space: • Decision version: O(log (1/P)/2 ) numbers in range 0…n • Estimation: : O(log(n)/ * log (1/P)/2 ) numbers in range 0…n (the probability of error is O(P log(n)/) ) • Better algorithms known: • Theory: O( 1/2 +log n) bits [Kane-Nelson-Woodruff’10] • Practice: need 128 bytes for all works of Shakespeare , ε≈10% [Durand-Flajolet’03]

  10. Chernoff bound Let X1…Xn be a sequence of i.i.d. 0-1 random variables such that Pr[Xi=1]=p. Let X=sumi Xi Then for any eps<1 we have Pr[ |X-pn| >eps pn] <2e-eps^2 pn/3

  11. Comments • Implementing S: • Choose a uniform hash function h: {1..m} -> {1..T} • Define S={i: h(i)=1} • We have i  S iff h(i)=1 • Implementing h • In practice: to compute h(i) do: • Seed=i • Return random()

  12. Vector X: 1 2 3 4 5 6 7 8 9 More comments • The algorithm uses “linear sketches” SumSj(x)=iSjxi • Can implement decrementsxi=xi-1 • I.e., the stream can contain deletions of elements (as long as x≥0) • Other names: dynamic model, turnstile model

  13. More General Problem • What other functions of a vector x can we maintain in small space ? • Lp norms: ||x||p = ( ∑i |xi|p )1/p • We also have ||x|| =maxi |xi| • … and we can define ||x||0 = DE(x), since ||x||pp =∑i |xi|p→DE(x) as p→0 • Alternatively: frequency moments Fp = p-th power of Lpnorms (exception: F0 = L0 ) • How much space do you need to estimate ||x||p (for const. ) ? • Theorem: • For p[0,2]:polylog n space suffices • For p>2:n1-2/p polylog n space suffices and is necessary [Alon-Matias-Szegedy’96, Feigenbaum-Kannan-Strauss-Viswanathan’99, Indyk’00, Coppersmith-Kumar’04, Ganguly’04, Bar-Yossef-Jayram-Kumar-Sivakumar’02’03, Saks-Sun’03, Indyk-Woodruff’05]

  14. Estimating L2 norm: AMS

  15. Alon-Matias-Szegedy’96 • Choose r1 … rmto be i.i.d. r.v., with Pr[ri=1]=Pr[ri=-1]=1/2 • Maintain Z=∑i ri xi under increments/decrements to xi • Algorithm I: Y=Z2 • “Claim”: Y “approximates” ||x||22with “good” probability

  16. Analysis • We will use Chebyshev inequality • Need expectation, variance • The expectation of Z2 = (∑i ri xi )2 is equal to E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj] • We have • For i≠j, E[rirj] = E[ri] E[rj] =0 – term disappears • For i=j, E[rirj] =1 • Therefore E[Z2] = ∑i xi2 =||x||22 (unbiased estimator)

  17. Analysis, ctd. • The second moment of Z2 = (∑i ri xi )2 is equal to the expectation of Z4 = (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) • This can be decomposed into a sum of • ∑i (ri xi )4 →expectation= ∑i xi 4 • 6 ∑i<j (ri rj xixj )2 →expectation= 6∑i<j xi2xj2 • Terms involving single multiplier ri xi (e.g., r1x1r2x2r3x3r4x4) →expectation=0 Total: ∑i xi 4 +6∑i<j xi2xj2 • The variance of Z2 is equal to E[Z4]-E2[Z2] = ∑i xi 4 +6∑i<j xi2xj2 –(∑i xi2 )2 = ∑i xi 4 +6∑i<j xi2xj2 –∑i xi4 -2 ∑i<j xi2xj2 = 4∑i<j xi2xj2 ≤ 2 (∑i xi 2 )2

  18. Analysis, ctd. • We have an estimator Y=Z2 • E[Y] = ∑i xi2 • σ2 =Var[Y] ≤ 2 (∑i xi 2 )2 • Chebyshev inequality: Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c2 • Algorithm II: • Maintain Z1 … Zk(and thusY1 … Yk), define Y’ = ∑i Yi /k • E[Y’] = k ∑i xi2 /k = ∑i x i2 • σ’2 = Var[Y’] ≤ 2k(∑i xi 2 )2 /k2 = 2 (∑i xi 2 )2 /k • Guarantee: Pr[ |Y’ - ∑i xi2 | ≥c (2/k)1/2 ∑i xi2 ] ≤ 1/c2 • Setting c to a constant and k=O(1/ε2) gives (1 ε)-approximation with const. probability

  19. Comments • Only needed 4-wise independence of r1…rm • I.e., for any subset S of {1…m}, |S|=4, and any b{-1,1}4, we have Pr[rS=b]=1/24 • Can generate such vars from O(log m) random bits • What we did: • Maintain a “linear sketch” vector Z=[Z1...Zk] = R x • Estimator for ||x||22 : (Z12 +... + Zk2)/k = ||Rx||22 /k • “Dimensionality reduction”: x→ Rx … but the tail somewhat “heavy” • Reason: only used second moment of the estimator

More Related