150 likes | 280 Vues
This document discusses stream-based geometric algorithms developed by Piotr Indyk at MIT, focusing on processing a stream of points in Rd to compute geometric quantities and structures such as Minimum Spanning Trees, Minimum Weight Matching, Facility Location, and K-median problems. Various variations are explored, including dynamic point deletion and sliding window scenarios. Key results include approximations for high- and low-dimension cases, leveraging probabilistic embeddings. Applications span clustering, server allocation, and understanding data set "clusterability," with potential improvements addressing log factor dependencies.
E N D
Stream-based Geometric Algorithms Piotr Indyk MIT
Streaming Algorithms for Geometric Problems • Input: a stream S=p1…pn of points in Rd • Goal: compute certain geometric quantity and/or structure • Variations: • Dynamic case: points can be deleted • Sliding window: points disappear after some time t
Minimum Spanning Tree • The tree has representation size (n) • We only estimate the cost of MST
Facility Location • Goal: choose a set F of facilities to minimize the • sum of the distances to nearest facility plus • the number of facilities times f
K-median • K is given • Goal: choose K medians to minimize the sum of • the distances to the nearest median
Known Results • Computing Lp norms of a stream (Graham’s talk) • Clustering of points in metric spaces • Charikar et al ’97, ’03; Guha et al’00: • K-center and K-median • (K) space, no deletions • Meyerson’02: • Facility location • (|F|) space, no deletions
More of Known Results • Approximate diameter etc • Indyk’03: high dimensions • Feigenbaum et al, Hershberger et al, Cormode et al’03: low dimensions • Convex hulls etc
Our Results *follows Charikar’02; also Varadarajan’02 and Indyk-Thaper’02
Applications • MST, MWM: ? • MWBM: similarity of low-dim data sets • Fac. Loc. : “clusterability” of a data set • K-median: allocation of servers to clients (Muthu’03) • log D might be not so bad in practice (1.1 in Indyk-Thaper’03)
Approach • Impose square grids G0…Gk, with side lengths 20,21, …, 2k , shifted at random. • For each square cell c in Gi, let nP(c) be the number of points from P in c. • The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems 1 2 1 3 1 1 3
Estimators • MST: ∑i 2i ∑c Gi [nP(c)>0] • MWM: ∑i 2i ∑c Gi [nP(c) is odd] • MWBM: ∑i 2i ∑c Gi |nG(c)-nB(c)| • Fac. Loc.: ∑i 2i ∑c Gi min[nP(c), Ti] • K-median: ∑c Bj nP(c) for B1…Bl sampled from Gi’s with density 1/K
Proofs • View the grids as a probabilistic embedding of P into a tree (HST’s) • Show how to solve the problem in HST’s • Show how to express the solution using just nP(c)’s • First application of this kind of embeddings to streaming
Conclusions and Open Problems • Replace log D by O(1) • Other apps ?