Valutazione delle Norme con le Applicazioni (Norm Estimation with Applications: A Survey)

Valutazione delle Norme con le Applicazioni(Norm Estimation with Applications: A Survey) David Woodruff IBM Almaden

Outline • The streaming model • Norm estimation • Problems • Results • Upper bounds • Lower bounds • Open questions

Data Stream Model [FM, AMS] • Model • A large object x, modeled as a vector • Could be a graph, matrix, set of points, etc. • x = (x1, x2, …, xn) starts off as 0n • Stream of m updates (j1, v1), …, (jm, vm) • Update (j, v) causes change xj = xj + v • v 2 {-M, -M+1, …, M} • Order and number of updates arbitrary

Application – IP session data AT & T collects 100+ GBs of NetFlow everyday

Application – IP Session Data • AT & T needs to process massive stream of network data • Traffic estimation • What fraction of network IP addresses are active? • Distinct elements computation • Traffic analysis • What are the 100 IP addresses with the most traffic? • Frequent items computation • Security/Denial of Service • Are there any IP addresses witnessing a spike in traffic? • Skewness computation

Algorithm Goals • Space Complexity: Minimize memory used by the streaming algorithm • n, m, and M are large • Pass Complexity: Minimize number of passes over the data • In many cases, only 1 pass is possible • Computation: Minimize the time spent per stream update • Ideally constant time

Long-Term Capital Risk Management hedge fund bailed out in late 90s because it underestimated kurtosis • Use high accuracy for estimating |x|4 • Testing distribution skewness. Easier than l1 norm • Denial of Service attacks • Ely Porat: “I know that Google is interested in compressed sensing with lp guarantees for p > 2” Vector Norm Estimation • Measuring distances between distributions • Embed other metrics into it (EMD, edit distance, etc.) • Geometric problems: clustering, nearest neighbor, etc. • Databases: self-join size • Problem – lp-norms • Compute (j=1n |xj|p)1/p = |x|p • p = 0 is number of non-zero entries of x • p = 1 is the Manhattan norm • p = 2 is the Euclidean norm • p = 3 is the skewness • p = 4 is the kurtosis • p = 1 is the maximum norm Finding most frequent items • Estimating number of distinct elements • Query planning + optimization

Other Applications of lp-Norms • lp for p 2 (0,1) • Entropy estimation [HNO] • Entropy = j qj log(1/qj), where qj = |xj|/|x|1 • Estimate |x|p for p 2 (0,1) • lp for p 2 [1, 1) • Regression: minx |Ax-b|p • bi = Ai x + Noisei • p = 1 is used to ignore outliers! • p = 1 is used to find outliers! • General p allows tuning • private norm estimation [FIMNSW, IW, MM, W]

Numerical-linear algebra: • Approximate matrix product • Low-rank approximation • Optimization: • - Minimize rank(X) subject to A(X)=B Matrix Norms • Operator norms of n x d matrix A • Compute |A|p = maxx  0 |Ax|p/|x|p • p = 1 is maximum l1-norm of a column • p = 2 is the spectral norm • p = 1 is maximum l1-norm of a row • Entrywise norms • Compute |A|p = (i,j |Aij|p )1/p • p = 2 is the Frobenius norm, also denoted |A|F • Schatten norms • p = 1 is the nuclear norm

Earthmover distance [ABIW] • l1-regression [SW] Mixed Norms • Mixed norm of n x d matrix A • Compute lp(lq(A)) = (i=1n |Ai|qp)1/p • Sum-norm • lp(X(A)) = (i=1n |Ai|Xp)1/p • lp(l0(A)) useful for multigraphs [CM] • lp(l2(A)) is used in k-median, k-means, and generalizations

Initial Observations • Any deterministic computation • of a vector norm requires (n) space • of a matrix norm requires (nd) space How do we cope? Allow randomness and a small probability δ of error • Any exact computation • of a vector norm requires (n) space • of a matrix norm requires (nd) space How do we cope? Output estimate Φ with |x|p·Φ· (1+ε)|x|p

Vector Norm Estimation - Use O*(f) to denote f¢poly(log(n/δ)/ε) - Assume n, m, M are polynomially related Rough bounds: [I] [IW, SS, BJKS] Algorithms are 1-pass. Lower bounds are for O*(1)-pass algorithms

Vector Norm Estimation Refined bounds for δ = 1/100: • p = 0: O(ε-2 log(n) (log 1/ε + loglog(n)) space, O(1) time (ε-2 log(n)) space [KNW] • p 2 (0,2): O(ε-2 log(n)) space, O(log2(1/ε) log log(1/ε)) time (ε-2 log(n)) space [KNPW] • p = 2: O(ε-2 log(n)) space, O(1) time (ε-2 log(n)) space [AMS, KNW, TZ] • p > 2: O(ε-2n1-2/p log2 n / min(log n, ε4/p-2)), O(log n) time (n1-2/p log n + ε-2+ n1-2/pε-2/p) space [G, JW, BJKS] For general δ, bounds in space get multiplied by log 1/δ[JW]

Mixed Norms [CM, JW, AKO, BIKW, MW] n1-1/p n1-q/p 1 p n1-2/p d1-2/q 2 d1-2/q 1 easy 1 0 1 2 q Complexity of estimating lp(lq(A)) for n x d matrix A

Matrix Norms Operator norms • |A|1 in £*(d) space • |A|2 in O*(d2) 1-pass • |A|1 in £*(n) space Entrywise norms • Space same as for vectors, e.g., |A|F in O*(1) space Schatten norms • |A|pp = (i=1n¾ip )1/p doable in £*(d) space if n = d and A is Laplacian of a graph and no negative values occur in the stream [KL]

Vector Norm Estimation • Can estimate lp-norm for every p ¸ 0 with the same data structure (with different parameters)! [IW] • Optimal in space and time up to O*(1) factors • More generally: obtain entire histogram of the values

Histogramming • Let Si = {j such that (1+ε)i· |xj| < (1+ε)i+1} • The |Si| summarize the coordinate values of x • Small histogram: only O(log(n)/ε) different i • Many, many applications • |x|pp = i |Si|¢(1+ε)ip • Find a data structure for estimating the |Si|

Three Ideas • Sign vector ¾2 {-1,1}n • For any fixed x, |<¾, x>| ¼ |x|2 • Bucketing • Given r buckets b1, …, br, randomly hash the coordinates of x into each bucket • Let x(bk) be the restriction of x to bucket k • E[|x(bk)|22] = |x|22/r • Subsampling • For j = 1, 2, …, log n Randomly sample a set Tj of 2j coordinates of x Let x(Tj) be the restriction of x to coordinates in Tj

The Data Structure For j = 1, …, log n • Choose a random set Tj of 2j coordinates of x • Randomly hash the coordinates of x(Tj) into r buckets • For each bucket bk, maintain < ¾j, x(Tj)(bk) >, where ¾j2 {-1, 1}n Space ¼ r Time ¼ 1 That’s all folks!

Why it Works • For p · 2, • |xk| ¸ε2/p |x(Tj)|2 / log1/p n • For p > 2, • |xk| ¸ε2/p |x(Tj)|2 / (n1/2-1/p log1/p n) • Suppose |Si| (1+ε)ip¸ε2|x|pp/log n If not, then • Consider j so that 2j |Si|/n = 1 • |x(Tj)|pp¼ 2j |x|pp / n • If k 2 SiÅ Tj , then |xk|p¸ε2|x(Tj)|pp / log n or |xk| ¸ε2/p |x(Tj)|p / log1/p n

Wrapping Up • For each Si, look at the appropriate level j of sub-sampling to find SiÅ Tj • E[|SiÅ Tj|] = |Si| 2j/n • Scale by n/2j to estimate |Si| • Output i |Si|¢(1+ε)ip

An Aside • We obtain samples from each Si for which |Si|¢(1+ε)ip¸ε2|x|pp/log n Sampling algorithm • Choose Si with probability |Si|¢(1+ε)ip / |x|pp • Output a sample from Si • Chooses a k 2 [n] with probability ¼ |xk|p/|x|pp • almost  • known as lp-sampling [MW] • useful in sublinear-time algorithms for minimum enclosing ball and classification [CHW]

Mixed Norms [JW] • lpp(lq(A)) = j (k |Ajk|q )p/q • Si = {j such that (1+ε)i·k |Ajk|q < (1+ε)i+1} Algorithm • lq-sample from A, treated as a vector • Use row identities of samples to estimate |Si|

Matrix Norms • Spectral norm of n x d matrix A • |A|2 = maxunit x |Ax|2 • Compute S¢A, where S is an O*(d) x n matrix of random signs • |SAx|2¼ |Ax|2 for all x • Output maxunit x |SAx|2 • Can do faster [AC]

1-Round Communication Complexity Bob Alice What is f(x,y)? y x • Alice sends a single message M(x) to Bob • Bob outputs a function of M(x), y • Bob’s output should equal f(x,y) with constant probability (over randomness of the protocol) • Communication cost CC(f) is |M(x)|, maximized over x and random bits

Reduction to Streaming y x Stream s(y) Stream s(x) S State of A Streaming algorithm A Streaming algorithm A If you can solve f(x,y) from A(s(x)±s(y)), then space of A is at least CC(f)

Canonical Indexing Problem What is xi? i 2 {1, 2, …, n} x 2 {0,1}n CC(Indexing) = (n)

(1/ε2) Bound What is |x-y|p? y = ei x 2 {- ε, ε}1/ε2 |x-y|pp = (1/ε2-1)εp + (1-xi)p Solves Indexing for p ¸ 2, so (1/ε2) bound For p < 2, see Amit’s talk

(n1-2/p) Bound for p ¸ 2 [SS, BJKS] What is |x-y|p? x 2 {1, 2,…, n}n y 2 {1, 2,…, n}n Promise: either all i satisfy xi – yi2 {0,1} or there is a j for which xj – yj¸ n1/p Communication is (n1-2/p) Proof bounds information that message reveals about input For every block of n2/p coordinates, reveal 1 bit of information

lp-Norms in Other Models - sliding window, time-decayed, out-of-order - read/write streams, annotations - distributed functional monitoring - compressed sensing

A Universal Data Structure • For j = 1, …, log n • Choose a random set Tj of 2j coordinates of x • Randomly hash the coordinates of x(Tj) into r buckets • For each bucket bk, maintain < ¾j, x(Tj)(bk) >, where ¾j2 {-1, 1}n In what sense is this data structure optimal for all functions of the form i f(xi)? Good progress on this [BO], but still open

Other Norms • Earthmover distance (EMD) • Given n green and n blue points in O(1) dimensions • Output (1+ε)-approximation to min-cost perfect matching • O(n) space upper bound, (log n) lower bound • Some progress [ABIW] EMD(, ) = 6 + 3√2

The Future • We’ve made progress • Improving ε and log n factors important in practice • Future themes? • more complicated norms and problems from optimization • emphasis on sketching for improving time

Bibliography • [ABIW] Andoni, DoBa, Indyk, W, FOCS, 2009. • [AC] Ailon, Chazelle, STOC, 2006. • [AMS] Alon, Matias, Szegedy, STOC, 1996. • [AKO] Andoni, Kraughtgamer, Onak, preprint. • [BJKS] Bar-Yossef et al., FOCS, 2002. • [BO] Braverman, Ostrovsky, STOC, 2010. • [CHW] Clarkson, Hazan, W. FOCS, 2010. • [CM] Cormode, Muthukrishnan, PODS, 2005. • [FIMNSW] Feigenbaum et al, ICALP, 2001 • [FM] Flajolet, Martin, FOCS, 1983. • [G] Ganguly, preprint. • [HNO] Harvey, Nelson, Onak, FOCS, 2008. • [I] Indyk, FOCS, 2000. • [IW] Indyk, W, STOC, 2005. • [JW] Jayram, W, FOCS, 2009. • [JW] Jayram, W, SODA, 2011. • [KNW] Kane, Nelson, W, SODA, 2010. • [KNPW] Kane, Nelson, Porat, W, STOC, 2011. • [MM] Madeira, Muthukrishnan, FSTTCS 2009. • [MW] Monemizadeh, W, SODA, 2010. • [SS] Saks, Sun, STOC, 2002. • [SW] Sohler, W, STOC, 2011. • [W] W, STOC, 2011.

Valutazione delle Norme con le Applicazioni (Norm Estimation with Applications: A Survey)

Valutazione delle Norme con le Applicazioni (Norm Estimation with Applications: A Survey)

Presentation Transcript

The Garch model and their Applications to the VaR

Cost Estimation

Sampling Distributions and Point Estimation of Parameters

Angle of Arrival Estimation (AOA)

New HMM-based methods for Ultra-large Alignment and Phylogeny Estimation