Property Testing: Sublinear-Time Approximate Decisions

Property Testing:Sublinear-Time Approximate Decisions Oded Goldreich Weizmann Institute of Science Talk at CTW, July 2013

My Aim: Promote Research in Property Testing I think this area is still under-explored. This holds in particular wrt testing graph properties (in the various models).

Property Testing (super-fast approximate decision):an illustration Gothic cathedral ? One Motivation:Real objects are far apart. Other motivations:Approx. per se, or a preliminary step. Deciding by inspecting few locations in the object.

? ? ? ? ? Focus: sub-linear time algorithms = performing the task by inspecting the object at few locations. Property Testing: informal definition A relaxation of a decision problem: For a fixed property Pand any object O, determine whether O has property P or is far from having property P(i.e., O is far from any other object having P). Objects viewed as functions. Inspecting = querying the function/oracle.

Property Testing: the standard (one-sided error) def’n • A property P = nPn , where Pn is a set of functions with domain Dn. • The tester T gets explicit input n and , • and oracle access to a function f with domain Dn. • If f Pn then Prob[Tf(n,) accepts] = 1 (or > 2/3). • If f is -far from Pn then Prob[Tf(n,) rejects] > 2/3.(Distance is defined as fraction of disagreements.) Focus: query complexity, q(n,) « |Dn| Special focus: q(n,)=q(), independent of n. Terminology:is called the proximity parameter.

Given (access to) a function f:G H, determine whether it is linear* (or far from any linear function). Example 1: Testing Linearity • The BLR Tester (repeated O(1/e) when given proximity par. e): • Select uniformly and independently x,yG. • Accept if and only if f(x)+f(y)=f(x+y). *) G and H are group, and f is linear (or a group homomorphism) if f(x+y)=f(x)+f(y) holds for every x,yG, where the1st “+” is of G and the 2nd of H.

Linearity Testing cont’ed (recall f:GH) • Select uniformly and independently x,yG. • Accept if and only if f(x)+f(y)=f(x+y). Analysis*: Clearly if f is linear, then the test accept w.p. 1. Suppose that f isδ-far from being linear (i.e., disagrees with each linear function on δlin(f)  δfraction of the domain). Let h be a linear function closest to f. Then, Prob[Testrejects f] = Probx,y[f(x)+f(y) ≠ f(x+y)]  3  Probx,y[f(x)≠h(x) & f(y)=h(y) & f(x+y)=h(x+y)]  3  Probx,y[f(x)≠h(x)]  (1 – 2  Probx,y[f(y)≠h(y) | f(x) ≠h(x)]) = 3δlin(f)  (1 – 2δlin(f)) So assuming that the rejection probability increases with the distance, we are done. But does this natural assumption hold? *) The analysis refers to a single iteration of the test.

Linearity Testing cont’ed (2) (recall f:GH) • Accept if and only if f(x)+f(y)=f(x+y). • Select uniformly and independently x,yG. Does the rejection probability increase with the distance?Surprisingly the answer is no, at least for G = (Z2)n and H = Z2! Rejection prob. δ ⅜ Notes: The maximum distance is ½. The lower bound 3δ-6δ2 is tight in [0,5/16]. Add’l lower bounds are δ and 45/128 for δ5/16. Indeed, strange… BTW, by an alternative simple pf: min(1/6,δ/2). 45/128 3δ–6δ2 5/16 ¼ distance (i.e. δ)

Note: The representation effects both the type of queries and the distance measure. Example 2: Testing Bipartiteness in the “Dense Graphs Model” A graph G=([N],E) is represented by a function g:[N][N]{0,1} (i.e., g(u,v)=1 iff (u,v) is an edge in G). • This (representation) determines: • The type of queries: adjacency queries • The distance measure: #differences/N2

Testing Bipartiteness in the Dense Graphs Model • The GGR Tester (input graph G, adjacency queries, prox. par. e): • Select uniformly a subset of Õ(1/e2) vertices in G. • Accept if and only ifthe subgraph induced by this set is bipartite. Analysis: Clearly if G is bipartite, then the test accept w.p. 1. Suppose that G=([N],E) ise-far from being bipartite (i.e., eN2 edges must be omitted from G to make it bipartite). Partition the sample to two (non-equal) parts, a Õ(1/e) subset denoted U, and a Õ(1/e2) subset denoted S. Consider all 2-way partitions of U: For each partition (U1,U2), consider the partition induced on all (graph) vertices such that all neighbors of Uiare on the side opposite to it. To be con’t

Testing Bipartiteness in the Dense Graphs Model, cont. • Analysis: Suppose that G=([N],E) ise-far from being bipartite. Partition the sample to two (non-equal) parts, aÕ(1/e) subset denoted U, and a Õ(1/e2) subset denoted S. Consider all 2-way partitions of U: For each partition (U1,U2), consider the partition induced on all graph vertices such that all neighbors of Uiare on the side opposite to it. • Vertices that neighbor both Ui’s “witness’’ the badness of (U1,U2). • W.h.p., almost all high degree vertices have neighbors in U(i.e., “high degree” = degree at least eN/4,“almost all” = all but at mosteN/4). • There are many violating edges between vertices assigned same side (i.e., “many” = at least eN2/2 edges). [Since U “dominates” almost edges.] • A vertex pair selected at random hits such a pair w.p. at least e/2. • Thus, each potential partition is “rejected” (i.e., we find a violating edge wrt it) with probability at least (1- e/2)|S|/2 = 1-exp(-|U|), which implies that w.p. at least 2/3 the subgraph induced by U ∪ S is not bipartite.

Task: Given proximity parameter e and (adjacency) query access to G, determine whether G is triangle-free or eN2 edges must be omitted to eliminate all triangles. Example 3: Testing Triangle-Freenessin the “Dense Graphs Model” (“Clearly”) The following tester will do:Select a sample ofM(e)vertex triples and accept if and only if noneof these triplets induces a triangle in G. (We query the relevant pairs...) How large should M(e)be? Please guess; to be cont’ed...

Task: Given proximity parameter e and (adjacency) query access to G, determine whether G is triangle-free or e-far from being triangle-free. Testing Triangle-Freeness in the “Dense Graphs Model”, cont. The candidate tester: Select a sample ofM(e)vertex triples and accept if and only if noneof these triplets induces a triangle in G. How large shouldM(e)be? Guess #1:M(e) = O(1/e3).Wrong! Guess #2:M(e) = poly(1/e).Wrong! Well, I don’t know the answer… Still, it is known that it is at least super-polynomial in1/eand at most a tower ofpoly(1/e)many exponents.

In General: Testing Graph Properties in the Dense Model Properties testable in F(e) queries testable in poly(1/e) queries Triangle-freeness [A] Bipartite [GGR, BT] testable in Õ(1/e) queries CC, BCC [GR08] (Characterization by [AFNS, BCLSSV]) testable in Õ(1/e) non-adaptive queries q adaptive queries  O(q2) non-adaptive queries [GT].

Testing Graph Properties in the Dense Model: The “lowest” complexity level testable in Õ(1/e) non-adaptive queries BL(H) for any fixed graph H [AG] BL(H) = the set of graphs obtained by a (not necessarily balanced) blow-up of the graph H. The special case of H being a clique was done in [GR08]. A blow-up of a 5-cycle

Note: The representation effects both the type of queries and the distance measure. Example 4: Testing Bipartiteness in the “Bounded-Degree Graphs Model” A graph G=([N],E) of maximal degree d is represented by a function g:[N][d][N]∪{0} (i.e., g(u,i)=v iff v is the ith neighbor of u in G). • This (representation) determines: • The type of queries: incidence queries • The distance measure: #differences/dN

Testing Bipartiteness in the Bounded-Degree Graphs Model Lower bound:Ω(N1/2) queries. (In contrast to “dense” graph model.) • The GR Tester (input graph G, incidence queries, prox. par. e): • Select uniformly O(1/e) start vertices. • For each start vertex s take m = Õ(N1/2/poly(e)) random walks, each of length l = poly((log N)/e)). • Accept if and only ifthe subgraph explored is bipartite. Analysis: Clearly if G is bipartite, then the test accept w.p. 1. Suppose that G=([N],E) ise-far from being bipartite (i.e., edN edges must be omitted from G to make it bipartite). Simplying assumption (unjustified!): G is an expander. Let pv(s) = probability that a lazy random l -walk starting at vertex s reaches v such that the induced path has length of parity s. Consider a 2-partition placing v according to pv(0) vs pv(1). To be con’t.

Testing Bipartiteness in the Bounded-Degree Model (cont.) • The GR Tester (input graph G, incidence queries, prox. par. e): • Select uniformly O(1/e) start vertices. • For each start vertex s take m = Õ(N1/2/poly(e)) random walks, each of length l = poly((log N)/e)). • Accept if and only ifthe subgraph explored is bipartite. Analysis: Simplying assumption (unjustified!): G is an expander. Let pv(s) = probability that a lazy random l -walk starting at vertex s reaches v such that the induced path has length of parity s. Consider a 2-partition placing v according to pv(0) vs pv(1). If ∑s ∑(u,v)E pu(s)pv(s) < e/N then this partition has at most edN violating edges,otherwise (i.e., larger sum) the tester rejects w. probability at least 2/3. Thus, if G is not e-close to Bipartite, the tester rejects w.p. > 2/3. each of these requires a pf Lazy random walk = in each step stays in place w.p. ½.

Observation: A graph is cycle-free iff the number of edges in it equals the number of vertices minus the number of connected components. We shall approximate both. Aux. Obs.: The number of large CCs is negligible, hence the number of small CC approximates the total number of CC. Example 5: Testing Cycle-freeness in the “Bounded-Degree Graphs Model” • The GR Tester (input graph G, incidence queries, prox. par. e): • Select uniformly s=O(1/e2) start vertices. • For each start vertex, explore (*) till visiting O(1/e) vertices. • If a cycle was found in any of these explorations reject. • Otherwise, let n denote the number of start vertices that reside in “large” components (i.e., CC that were not fully explored) and m be half the sum of their degrees. Accept iff |n-m| < es/2. • *) The exploration is rather arbitrary.

The GR Tester (input graph G, incidence queries, prox. par. e): • Select uniformly s=O(1/e2) start vertices. • For each start vertex, explore till visiting O(1/e) vertices. • If a cycle was found in any of these explorations reject. • Otherwise, let n denote the number of start vertices that reside in “large” components (i.e., CC that were not fully explored) and m be half the sum of their degrees. Accept iff |n-m| < es/2. Testing Cycle-freeness in the “Bounded-Degree Graphs Model”, cont. The tester approximates the number of edges and the number of connected components. Hence, it has two-sided error. For constant e, this tester makes O(1) queries! N.B.: The tester does not try to find cycles.In contrast, a one-sided error tester may only reject when seeing cycles. THM: Cycle-freeness has noone-sided error tester of o(N1/2) query complexity, but does have a one-sided error tester of Õ(N1/2) queries.

In General: Testing Graph Properties in the Bounded-Degree Model • Questions (wrt constant proximity parameter): • Testability in sub-linear query complexity. • Testability in constant query complexity. • One-sided vs two-sided probability error. E.g., cycle-freeness has a constant-query tester of two-sided error, noone-sided error tester of o(N1/2) query complexity, but does have a one-sided error tester of Õ(N1/2) queries.

End The slides of this talk are available at http://www.wisdom.weizmann.ac.il/~oded/T/pt-intro.ppt A survey on testing graph properties is available at http://www.wisdom.weizmann.ac.il/~oded/p_tgp.html Other surveys are available at http://www.wisdom.weizmann.ac.il/~oded/surveys.html

The GR Tester (input graph G, incidence queries, prox. par. e): • Select uniformly O(1/e) start vertices. • For each start vertex, explore (*) till visiting Õ(1/e) vertices. • Accept if and only ifno small connected component is seen. • *) The exploration is rather arbitrary. • In a more efficient tester Steps (1) & (2) are replaced by selecting, for each i=1,…,log(1/e), 2istart vertices and exploring from each of these vertices till visiting O(2-i/e). Example 6: Testing Connectivity in the “Bounded-Degree Graphs Model” Observation: A graph is far from being connected if and only if it has many (small) connected components.

Property Testing: Sublinear-Time Approximate Decisions

Property Testing: Sublinear-Time Approximate Decisions

Presentation Transcript

PowerPoint Lesson 1 PowerPoint Basics

A Tutorial on Property Testing

Testing Acyclicity of Directed Graphs in Sublinear Time

PowerPoint

Chapter 5 : PowerPoint Presentation

PowerPoint

PowerPoint gets Interactive

eLearning Presentation

Adversarial Search

Lower Bounds for Property Testing

CCNA 1 v3.1 Module 4 Cable Testing

Sublinear Algorihms for Big Data

Property Testing on Combinatorial Objects

Something for almost nothing: Advances in sublinear time algorithms

Lab 8 Predicting Strength of Trusses Approximate Running Time – 20 minutes

Lecture 6 Introduction to Engineering Approximate Running Time - 19 minutes

Sublinear Algorithms

PowerPoint Presentation

Correlation testing for affine invariant properties on

The Big Picture