1 / 59

Lower Bound Techniques for Data Structures

Lower Bound Techniques for Data Structures. Mihai P ătra ș cu. …. Committee: Erik Demaine (advisor) Piotr Indyk Mikkel Thorup. Data Structures. I don’t study stacks, queues and binary s earch t rees ! I do study data structure problems (a.k.a. Abstract Data Types). partial-sums

eliora
Télécharger la présentation

Lower Bound Techniques for Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lower Bound Techniquesfor Data Structures Mihai Pătrașcu … • Committee: • Erik Demaine (advisor) • PiotrIndyk • MikkelThorup

  2. Data Structures I don’t study stacks, queues and binary search trees! I do study data structure problems (a.k.a. Abstract Data Types) partial-sums problem • Preprocess T = { n numbers } • pred(q): max { y єT | y < q} predecessor search • Maintain an array A[n] under: • update(i, Δ): A[i] = Δ • sum(i): return A[0] + … + A[i]

  3. Motivation? • Support both:* list operations – concatenate, split, … • * array operations – index 0 1 2 3 0 1 2 3 4 • packet forwarding partial-sums problem • Preprocess T = { n numbers } • pred(q): max { y єT | y < q} predecessor search • Maintain an array A[n] under: • update(i, Δ): A[i] = Δ • sum(i): return A[0] + … + A[i]

  4. Binary Search Trees = Upper Bound “Binary search trees solve predecessor search” => Complexity of predecessor ≤ O(lg n)/operation my work ≤ “Augmented binary search trees solve partial sums” => Complexity of partial sums ≤ O(lg n)/operation my work ≤ partial-sums problem • Preprocess T = { n numbers } • pred(q): max { y єT | y < q} predecessor search • Maintain an array A[n] under: • update(i, Δ): A[i] = Δ • sum(i): return A[0] + … + A[i]

  5. What kind of “lower bound”? Lower bounds you can trust.TM Model of computation ≈ real computers: • memory words of w > lgn bits (pointers = words) • random access to memory • any operation on CPU registers (arithmetic, bitwise…) Just prove lower bound on # memory accesses “Array Mem[1..S] of w-bit words” “Black box”

  6. Why Data Structures? I want to understand computation. • Other settings: • streaming L.B. : many not very “computational” mostly storage / info thy • space-bounded (PvsL) • L.B. : a few, Ω(n √lg n) unnatural questions • algebraic L.B. : some  cool, but not real computing… • depth 3 circuits with mod-6 gates ?? The gospel: • data structures L.B. : some understand some nontrivial computational phenomena • efficient algorithms circuit L.B. not forthcoming • hard optimization NP-completenessL.B. : one per STOC/FOCS 

  7. Why Data Structures? I want to understand computation. • Other settings: • streaming L.B. : many not very “computational” mostly storage / info thy • space-bounded (PvsL) • L.B. : a few, Ω(n √lg n) unnatural questions • algebraic L.B. : some  cool, but not real computing… • depth 3 circuits with mod-6 gates ?? The gospel: • data structures L.B. : some understand some nontrivial computational phenomena • efficient algorithms circuit L.B. not forthcoming • hard optimization NP-completenessL.B. : one per STOC/FOCS  Weak as some of the lower bounds may be, it’s the area that has gotten farthest towards “understanding computation”

  8. History* *Omitted: bounds for succinct data structures. • Observations: • huge influence • 2nd papers • result wrong (better upper bound known) • no journal version; many claims without proof [Yao, FOCS’78] [Ajtai’88] -- predecessor (static) [Fredman, Saks’89] -- partial sums, union find (dynamic)

  9. History* *Omitted: bounds for succinct data structures. [Yao, FOCS’78] [Ajtai’88] -- predecessor (static) [Bing Xiao, Stanford’92] ** [Miltersen STOC’94] [Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen ICALP’01] (1+ε)-nearest neighbor:• [Chakrabarti, Chazelle, Gum, Lvov STOC’99]• [Chakrabarti, Regev FOCS’04] [Fredman, Saks’89] -- partial sums, union find (dynamic) [Ben-Amram, Galil FOCS’91] [Miltersen, Subramanian, Vitter, Tamassia’93] [Husfeldt, Rauhe, Skyum’96] [Fredman, Henzinger’98]planar connectivity [Husfeldt, Rauhe ICALP’98]nondeterminism [Alstrup, Husfeldt, Rauhe FOCS’98]marked ancestor • [Alstrup, Husfeldt , Rauhe SODA’01]dynamic 2D NN [Alstrup, Ben-Amram, Rauhe STOC’99] union-find === richness lower bounds === [Borodin, Ostrovsky, Rabani STOC’99] p.m. [Barkol, Rabani STOC’00] rand. NN [Jayram,Khot,Kumar,Rabani STOC’03] p.m. [Liu’04] det. ANN

  10. Three Main Ideas [Yao, FOCS’78] [Ajtai’88] -- predecessor (static) [Bing Xiao, Stanford’92] ** [Miltersen STOC’94] [Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen ICALP’01] (1+ε)-nearest neighbor:• [Chakrabarti, Chazelle, Gum, Lvov STOC’99]• [Chakrabarti, Regev FOCS’04] [Fredman, Saks’89] -- partial sums, union find (dynamic) [Ben-Amram, Galil FOCS’91] [Miltersen, Subramanian, Vitter, Tamassia’93] [Husfeldt, Rauhe, Skyum’96] [Fredman, Henzinger’98]planar connectivity [Husfeldt, Rauhe ICALP’98]nondeterminism [Alstrup, Husfeldt, Rauhe FOCS’98]marked ancestor • [Alstrup, Husfeldt , Rauhe SODA’01]dynamic 2D NN [Alstrup, Ben-Amram, Rauhe STOC’99] union-find === richness lower bounds ===[Borodin, Ostrovsky, Rabani STOC’99] p.m. [Barkol, Rabani STOC’00] rand. NN [Jayram,Khot,Kumar,Rabani STOC’03] p.m. [Liu’04] det. ANN 3. Round Elimination 2. Asym. Communication, Rectangles 1. Epochs

  11. Three Main Ideas [Yao, FOCS’78] [Ajtai’88] -- predecessor (static) [Bing Xiao, Stanford’92] ** [Miltersen STOC’94] [Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen ICALP’01] (1+ε)-nearest neighbor:• [Chakrabarti, Chazelle, Gum, Lvov STOC’99]• [Chakrabarti, Regev FOCS’04] [Fredman, Saks’89] -- partial sums, union find (dynamic) [Ben-Amram, Galil FOCS’91] [Miltersen, Subramanian, Vitter, Tamassia’93] [Husfeldt, Rauhe, Skyum’96] [Fredman, Henzinger’98]planar connectivity [Husfeldt, Rauhe ICALP’98]nondeterminism [Alstrup, Husfeldt, Rauhe FOCS’98]marked ancestor • [Alstrup, Husfeldt , Rauhe SODA’01]dynamic 2D NN [Alstrup, Ben-Amram, Rauhe STOC’99] union-find === richness lower bounds ===[Borodin, Ostrovsky, Rabani STOC’99] p.m. [Barkol, Rabani STOC’00] rand. NN [Jayram,Khot,Kumar,Rabani STOC’03] p.m. [Liu’04] det. ANN 3. Round Elimination 2. Asym. Communication, Rectangles 1. Epochs

  12. Review: Epoch Lower Bounds time update: mark/unmark node [tu] query: # marked ancestors? [tq] • epoch j: rj updates • epochs {0, .., j-1} write O(tuw∙rj-1) bits • pick r >>tuw most updates from epoch j not known outside epoch j random query needs to read a cell from epoch j max {tq, tu} = Ω(lg n / lglg n) tq= Ω(lg n / lg r) = Ω(lg n / lg(tuw))

  13. Review: Epoch Lower Bounds • See also: • [FredmanJACM ’81] • [FredmanJACM ’82] • [Yao SICOMP ’85] • [Fredman, Saks STOC ’89] • [Ben-Amram, GalilFOCS ’91] • [Hampapuram, FredmanFOCS ’93] • [ChazelleSTOC ’95] • [Husfeldt, Rauhe, SkyumSWAT ’96] • [Husfeldt, RauheICALP ’98] • [Alstrup, Husfeldt, RauheFOCS ’98] “Big Challenges” [Miltersen’99] • prove some ω(lgn/lglgn) bound Candidate: Ω(lgn) for the partial sums problem • prove ω(lgn) in the bit-probe model Maintain an array A[n] under:update(i, Δ): A[i] += Δsum(i): return A[0] + … + A[i]

  14. Our contribution [P., Demaine SODA’04]Ω(lgn) for partial sums [P., Demaine STOC’04]Ω(lgn) for dynamic trees, etc. * very simple proof * not based on epochs [P., Tarniţă ICALP’05]Ω(lgn) via epoch argument!! =>Ω(lg2n/lg2lg n) in the bit-probe model Best Student Paper

  15. Ω(lgn) via Epoch Arguments? Old: information about epoch j outside j ≤ #cells written by epochs {0, .., j-1} ≤ O(tu∙rj-1) j

  16. Ω(lgn) via Epoch Arguments? New: information about epoch j outside j ≤ #cells read by epochs {0, .., j-1} from epoch j still≤ O(tu∙rj-1) in the worst case  j • Foil worst-case by randomizing epoch construction!

  17. Ω(lgn) via Epoch Arguments? #cells read by epochs {0, .., j-1} from epoch j ≤ O((tu/ #epochs) ∙ rj-1)on average => max { tu, tq } = Ω(lg n) • Foil worst-case by randomizing epoch construction!

  18. The “Very Simple Ω(lg n) Proof”

  19. π Maintain an array A[n] under: update(i, Δ): A[i] = Δ sum(i): return A[0] + … + A[i] Δ1 Δ2 The hard instance: π = random permutation for t = 1 to n:query: sum(π(t))Δt= rand()update(π(t), Δt) Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9 Δ10 Δ11 Δ12 Δ13 Δ14 Δ15 Δ16 time

  20. Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9 Δ10 Δ11 Δ12 Δ13 • How can Mac help PC run ? Δ14 t = 9,…,12 Δ16 Δ17 Communication ≈ # memory locations * read during * written during time t = 9,…,12 t = 9,…,12 t = 5, …, 8 t = 5, …, 8

  21. Δ1 Δ2 Δ3 Δ4 Δ5 Δ8 Δ7 Δ9 Δ1+Δ5+Δ3+Δ7+Δ2 Δ1 Δ1+Δ5+Δ3 Δ13 How much information needs to be transferred? Δ1+Δ5+Δ3+Δ7+Δ2+Δ8+Δ4 Δ14 Δ16 Δ17 time At least Δ5,Δ5+Δ7,Δ5+Δ7+Δ8 => i.e. at least 3 words (random values incompressible)

  22. The general principle Lower bound = # down arrows How many down arrows? (in expectation) (2k-1) ∙ Pr[ ] ∙ Pr[ ] = (2k-1) ∙ ½ ∙ ½ = Ω(k) k operations k operations

  23. Recap • Communication = # memory locations • * read during • * written during pink period yellow period • Communication between periods of k items = Ω(k) * read during * written during pink period # memory locations • = Ω(k) yellow period

  24. Putting it all together aaaa • Ω(n/8) • Ω(n/4) Every load instruction counted once @ lowest_common_ancestor( , ) • Ω(n/8) • Ω(n/2) write time read time • Ω(n/8) • Ω(n/4) • Ω(n/8) • totalΩ(nlgn) time

  25. Q.E.D. • Augmented binary search trees are optimal. • First “Ω(lgn)” for any dynamic data structure.

  26. Three Main Ideas [Yao, FOCS’78] [Ajtai’88] -- predecessor (static) [Bing Xiao, Stanford’92] ** [Miltersen STOC’94] [Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen ICALP’01] (1+ε)-nearest neighbor:• [Chakrabarti, Chazelle, Gum, Lvov STOC’99]• [Chakrabarti, Regev FOCS’04] [Fredman, Saks’89] -- partial sums, union find (dynamic) [Ben-Amram, Galil FOCS’91] [Miltersen, Subramanian, Vitter, Tamassia’93] [Husfeldt, Rauhe, Skyum’96] [Fredman, Henzinger’98]planar connectivity [Husfeldt, Rauhe ICALP’98]nondeterminism [Alstrup, Husfeldt, Rauhe FOCS’98]marked ancestor • [Alstrup, Husfeldt , Rauhe SODA’01]dynamic 2D NN [Alstrup, Ben-Amram, Rauhe STOC’99] union-find === richness lower bounds ===[Borodin, Ostrovsky, Rabani STOC’99] p.m. [Barkol, Rabani STOC’00] rand. NN [Jayram,Khot,Kumar,Rabani STOC’03] p.m. [Liu’04] det. ANN 3. Round Elimination 2. Asym. Communication, Rectangles 2. Asym. Communication, Rectangles 1. Epochs

  27. Review: Communication Complexity

  28. Review: Communication Complexity • lgS bits • w bits • lgS bits • w bits • database • => space S • query(a,b,c) Traditional communication complexity: “total #bits communicated ≥ X” =>tq∙(lg S + w) ≥ X => tq = Ω(X/w) But wait! X ≤ CPU input ≤ O(w)

  29. Review: Communication Complexity • lgS bits • w bits • lgS bits • w bits • database • => space S • query(a,b,c) Asymmetric communication complexity: “either Alice sends A bits or Bob sends B bits” => either tq∙lg S ≥ A or tq∙w≥ B => tq≥ min { A/lg S, B/w}

  30. Richness Lower Bounds Prove: “either Alice sends A bits or Bob sends B bits” Assume Alice sends o(A), Bob sends o(B) => big monochromatic rectangle Show any big rectangle is bichromatic (standard idea in comm. complex.) Bob output=1 1/2o(A) Alice 1/2o(B) Example: Alice --> q є {0,1}d Bob --> S=n points in {0,1}d Goal: find argminxєS|| x-q ||2 [Barkol, Rabani] A=Ω(d), B=Ω(n1-ε) => tq ≥ min { d/lg S, n1-ε/w }

  31. Richness Lower Bounds • upper bound ≈ either: • exponential space • near-linear query time What does this really mean? “optimal space lower bound for constant query time” tq n1-o(1) Θ(d/lg n) 1 S lower bound S = 2Ω(d/tq) Θ(n) 2Θ(d) Example: Alice --> q є {0,1}d Bob --> S=n points in {0,1}d Goal: find argminxєS|| x-q ||2 [Barkol, Rabani] A=Ω(d), B=Ω(n1-ε) => tq ≥ min { d/lg S, n1-ε/w } Also: optimal lower bound for decision trees

  32. Results Partial match -- database of n strings in {0,1}d, query є {0,1,*}d[Borodin, Ostrovsky, Rabani STOC’99][Jayram,Khot,Kumar,Rabani STOC’03] A=Ω(d/lg n) [P. FOCS’08]A = Ω(d) Nearest Neighbor on hypercube (ℓ1, ℓ2): deterministicγ-approximate: [Liu’04] A = Ω(d/γ2) randomized exact: [Barkol, Rabani STOC’00]A = Ω(d) rand. (1+ε)-approx: [Andoni, Indyk, P. FOCS’06] A = Ω(ε-2lg n) “Johnson-Lindenstrauss space is optimal!” Approximate Nearest Neighbor in ℓ∞: [Andoni, Croitoru, P. FOCS’08]“[Indyk FOCS’98] is optimal!” simplify

  33. Limits of Communication Approach tq • lgS bits branchingprograms n1-o(1) Θ(d/lg d) • w bits Implication of richness lower bound undervalued! Θ(d/lg n) 1 S Θ(n) 2Θ(d) “ Alice must send Ω(A) bits” => tq= Ω(A / lg S) => tq= Ω(A / lg(Sd/n)) No separation between S=O(n) and S=nO(1)! Separation of Ω(lgn / lglgn) betweenS=O(n)and S=nO(1)!

  34. Richness Gets You More • CPU(s) -->memory communication: • one query: lgS • kqueries: lg( )=Θ(klg) S k S k

  35. Richness Gets You More • CPU(s) -->memory communication: • one query: lgS • kqueries: lg( )=Θ(klg) S k S k Prob.1 Prob.3 Prob.2 Prob.k

  36. Richness Gets You More • CPU(s) -->memory communication: • one query: lgS • kqueries: lg( )=Θ(klg) S k S k Prob.1 Prob.3 Prob.2 Direct Sum Prob.k Any richness lower bound “Alice must send A or Bob must send B” ===> k∙Alicemust send k∙A or k∙Bobmust send k∙B

  37. Richness Gets You More • CPU(s) -->memory communication: • one query: lgS • kqueries: lg( )=Θ(klg) tq= Ω(A / lg(S/k)) S k S k Direct Sum Any richness lower bound “Alice must send A or Bob must send B” ===> k∙Alicemust send k∙A or k∙Bobmust send k∙B

  38. Three Main Ideas [Yao, FOCS’78] [Ajtai’88] -- predecessor [Bing Xiao, Stanford’92] [Miltersen STOC’94][Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen ICALP’01] (1+ε)-nearest neighbor: • [Chakrabarti, Chazelle, Gum, Lvov STOC’99] • [Chakrabarti, Regev FOCS’04] [Fredman, Saks’89] - partial sums, union find [Ben-Amram, Galil FOCS’91] [Miltersen, Subramanian, Vitter, Tamassia’93] [Husfeldt, Rauhe, Skyum’96][Fredman, Henzinger’98]planar connectivity [Husfeldt, Rauhe ICALP’98]nondeterminism [Alstrup, Husfeldt, Rauhe FOCS’98]marked ancestor [Alstrup, Husfeldt , Rauhe SODA’01] dynamic 2D NN [Alstrup, Ben-Amram, Rauhe STOC’99] union-find === richness lower bounds ===[Borodin, Ostrovsky, Rabani STOC’99] p.m. [Barkol, Rabani STOC’00] rand. NN [Jayram,Khot,Kumar,Rabani STOC’03] p.m. [Liu’04] det. ANN 3. Round Elimination 2. Asym. Communication, Rectangles 1. Epochs 4. Range Queries

  39. Open Hunting Season Nice trick, but “Ω(lgn / lglgn) with O(npolylgn) space” not impressive argument for “curse of dimensionality” But space n1+o(1) is hugely important in data structures => open hunting season for range queries etc. 71000 2D range counting 70000 69000 • SELECT count(*) • FROM employees • WHERE salary <= 70000 • AND startdate <= 1998 68000

  40. Open Hunting Season • [P. STOC’07] Ω(lgn / lglgn) with O(npolylgn) space N.B. : tight! • 1st bound beyond the semigroup model question from [FredmanJACM’82] [ChazelleFOCS’86] 71000 2D range counting 70000 69000 • SELECT count(*) • FROM employees • WHERE salary <= 70000 • AND startdate <= 1998 68000

  41. The Power of Reductions 2D stabbing • Preprocess S={n rectangles} • • stab(x,y): is (x,y) inside some RєS? routing ACLs dispatching in some OO languages 71000 2D range counting 70000 69000 • SELECT count(*) • FROM employees • WHERE salary <= 70000 • AND startdate <= 1998 68000

  42. The Power of Reductions -1 +1 -1 +1 2D stabbing +1 -1 -1 +1 -1 • Preprocess S={n rectangles} • • stab(x,y): is (x,y) inside some RєS? +1 -1 +1 +1 -1 +1 -1 71000 2D range counting 70000 69000 • SELECT count(*) • FROM employees • WHERE salary <= 70000 • AND startdate <= 1998 68000

  43. The Power of Reductions 2D stabbing • Preprocess S={n rectangles} • • stab(x,y): is (x,y) inside some RєS? reachability oracles in butterfly graph • Preprocess G = subgraph of butterfly • • reachable(x,y): is there a path x->y ?

  44. The Power of Reductions 2D stabbing • Preprocess S={n rectangles} • • stab(x,y): is (x,y) inside some RєS? reachability oracles in butterfly graph • Preprocess G = subgraph of butterfly • • reachable(x,y): is there a path x->y ?

  45. The Power of Reductions Lopsided Set Disjointness • Alice: set SBob: set T“are S and T disjoint?” Hint: S = {one edge out of every node} => n queries from 1st to last level T = {deleted edges}S disjoint from T => all queries “yes” reachability oracles in butterfly graph • Preprocess G = subgraph of butterfly • • reachable(x,y): is there a path x->y ?

  46. Reachability in Butterfly?? marked ancestor problem update(node): (un)mark node query(leaf):any marked ancestor?

  47. lopsided set disjointness (LSD) reachability oracles in the butterfly partial match (1+ε)-ANN ℓ1, ℓ2 NN in ℓ1, ℓ2 dyn. marked ancestor 3-ANN in ℓ∞ 2D stabbing worst-case union-find dyn. trees, graphs 4D reporting 2D counting dyn. 1D stabbing [P. FOCS’08] partial sums dyn. 2D reporting dyn. NN in 2D

  48. Three Main Ideas [Yao, FOCS’78] [Ajtai’88] -- predecessor [Bing Xiao, Stanford’92] [Miltersen STOC’94][Miltersen, Nisan, Safra, Wigderson STOC’95] [Beame, Fich STOC’99] [Sen ICALP’01] (1+ε)-nearest neighbor: • [Chakrabarti, Chazelle, Gum, Lvov STOC’99] • [Chakrabarti, Regev FOCS’04] [Fredman, Saks’89] - partial sums, union find [Ben-Amram, Galil FOCS’91] [Miltersen, Subramanian, Vitter, Tamassia’93] [Husfeldt, Rauhe, Skyum’96][Fredman, Henzinger’98]planar connectivity [Husfeldt, Rauhe ICALP’98]nondeterminism [Alstrup, Husfeldt, Rauhe FOCS’98]marked ancestor [Alstrup, Husfeldt , Rauhe SODA’01] dynamic 2D NN [Alstrup, Ben-Amram, Rauhe STOC’99] union-find === richness lower bounds ===[Borodin, Ostrovsky, Rabani STOC’99] p.m. [Barkol, Rabani STOC’00] rand. NN [Jayram,Khot,Kumar,Rabani STOC’03] p.m. [Liu’04] det. ANN 3. Round Elimination 3. Round Elimination 2. Asym. Communication, Rectangles 1. Epochs 4. Range Queries

  49. Packet Forwarding/ Predecessor Search Preprocess n prefixes of ≤ w bits:  make a hash-table H with all prefixes of prefixes  |H|=O(n∙w), can be reduced to O(n) Given w-bit IP, find longest matching prefix:  binary search for longest ℓ such that IP[0: ℓ] єH [van Emde Boas FOCS’75] [Waldvogel, Varghese, Turener, PlattnerSIGCOMM’97] [Degermark, Brodnik, Carlsson, Pink SIGCOMM’97] [Afek, Bremler-Barr, Har-PeledSIGCOMM’99] O(lgw)

  50. Review: Round Elimination hi lo hash(hi) 0/1† I want to talk to Alice † 0: continue searching for pred(hi) 1: continue searching for pred(lo) i 1 o(k) bits Message has negligible info about the typical i => can be eliminated for fixed i 2 k

More Related