240 likes | 316 Vues
Explore a better probabilistic model, probabilities of conjunctive queries, and FO probabilities based on research by Dalvi, Miklau, Lynch, Shelah, and Spencer. Learn subgraph properties, compute probabilities for queries, and delve into subgraph distributions and threshold functions.
E N D
Finite Model TheoryLecture 18 Extended 0/1 LawsOr “Getting Real”
Outline • A better probabilistic model • Probabilities of conjunctive queries • Probabilities for FO • Based on work done with N. Dalvi and G.Miklau, and on papers by Lynch, Shelah and Spencer
Annomalies 0/1 Laws Database schema:Employee(name, city, occupation) We are not given the instance. • Any person belongs to Employee with m = 1/2 ! • The expected size E[Employee] = n3/2 !1 !! • In practice need conditional probabilities, m(f | y), but they often don’t exists [ why ?]
A Better Model • Postulate that for each R 2sE[R] = cR (a constant) • This leads to: for each tuple t:Pr[t 2 R] = cR / na where a = arity(R)
A Better Model No more anomalies: • For a given person, the probability of it belonging to Employee is ! 0 • The expected size is E[R] = cR • Asymptotic conditional probabilities always exists for conjunctive queries
Conjunctive Queries • Have the form:9 x1…9 xk.(C1Æ … Æ Cm) • Where each Ci is R(…) or xi=xj or xi¹ xj Empolyee(x,Seattle,-),Employee(x,y,Clerk),Employee(-,y,Lawer)
Conjunctive Queries TheoremFor every Q there are numbers E, C s.t: Pr[Q] =C / nE + O(1/NE+1) Corollary Pr[Q1 | Q2] always has a limit • Will show next how to compute C, E
Subgraph Properties • Consider R(x,y); • For every edge, Pr(R(u,v)) = c/n2 • Given Q, let H = Q¹ obtained by adding all predicates of the form xi¹ xj • H checks for the presence of a subgraph
Subgraph Properties Example 1: • Q = R(x,y),R(y,z),R(z,x)H=Q¹ = R(x,y),R(y,z),R(z,x),x¹ y,y¹ z,z¹ x H =
Subgraph Properties Pr(H) = Pr(Çu,v,w H(u,v,w)) ·åu,v,w Pr(H(u,v,w)) = n(n-1)(n-2) * 1/3 * c3 / n6 = 1/3 c3 / n3 + O(1/n4)
Subgraph Properties Example 2: Q = R(x,y),R(y,a),R(b,x) H=Q¹=R(x,y),R(y,z),R(z,x),x¹ y,y¹a,a¹x,x¹b, b¹x b a
Subgraph Properties Pr(H) = Pr(Çu,v H(u,v)) ·åu,v Pr(H(u,v)) = n(n-1) * 1/1 * c3 / n6 = c3 / n4 + O(1/n5)
Subgraph Properties Let Q = G1, G2, …, Gm Lemma Pr(Q) · C/H * 1/nE V = number of variables in Q A = arity(Q) = arity(G1) + … + arity(Gm) E = A - V = “the exponent of Q” H = number of automorphisms Q ! Q C = c1 * c2 * … * cm = “the coefficient of Q”
Subgraph Properties Lower bound, for the triangle: Pr(H) = Pr(Çu,v,w H(u,v,w)) ¸åPr(H(u,v,w)) – åPr(H(u,v,w)Æ H(u’,v’,w’)= 1/3 c3/n3 + O(1/n4) - å Pr(HH)
Subgraph Properties • What is Pr(H) ? Each term belongs to one of the following cases: E = 12 – 6 = 6 E = 12 – 5 = 7 E = 10 – 4 = 6 A few others…. But all have E > 3 ! Hence Pr(HH) is neglijible
Subgraph Properties • Hence, for the triangle: Pr(H) ¼ 1/3 c3/n3 • This generalizes easily to any subgraph property
Subgraphs with E = 0 H = R(x,y) E = 2-2 = 0; what is Pr(H) ? H = R(x,y)R(u,v) E = 4–4 = 0what is Pr(H) ? H = R(x,y)R(y,z)R(z,x), R(u,v) E(H) = E(triangle); Exponent in the theorem is always correct, but need to adjust the coefficient
Conjunctive Queries • Consider the query:R(x,y),R(y,z),R(z,x) • Any of the variables x,y,z may be equal: results in the following subgraphs:H1 = R(x,y)R(y,z)R(z,x) E=6-3=3H2 = R(x,x)R(x,z)R(z,x) E=6-2=4H3 = R(x,x)R(x,x)R(x,x) = R(x,x) E=2 • Hence Pr(Q) = Pr(H3) = cR/n2
Conjunctive Queries • Now considerQ = R(a,x),R(y,b) • Two graphs:H1 = R(a,x)R(y,b) E = 4-2=2H2 = R(a,b) E = 2 • One can prove:Pr(Q) = Pr(H1) + Pr(H2) = (c + c2)/n2
More General Distributions [Shelah&Spencer, Lynch] • Pr(tuple) = b / na • Example: H = triangle • Pr(H) ¼ n3 * 1/3 * b3 / n3a = C / nE • Simply redefine E(H) to use a
More General Distributions • But, problem here; let \alpha = 3/2: E( ) = 3a – 3 = 3/2 E( ) = 3a – 3 + a – 2 = 1 Hence the more complex graph is more likely ! Solution: adjust E(H) to be the max of E(H0) for H0µ H
Threshold Functions for Subgraphs [Erdos and Reny] Edge probability Pr(t) = p(n) = some function Main theorem of random graphs:For any monotone property C there exists a threshold function t(n) s.t. • If p(n) ¿ t(n) then limn Pr(C) = 0 • If p(n) À t(n) then limn Pr(C) = 1
Threshold Functions [Erdos and Reny] The threshold function for subgraph property H is the following: Let a = maxH0µ H |nodes(H0)| / |edges(H0)| Then t(n) = 1/na Can derive it from the exponent [ show in class ]
Extended 0/1 Laws • Shelah and Spencer, and Lynch consider the following general case: • Pr(t) = b / na, for a > 0 • Lynch: a logic admits an extended 0/1 law if for each f one of the following holds:Pr(f) ¼ C/nE, orPr(f) < 1/nE for every E >0