1 / 21

Inferring Causal Graphs

Inferring Causal Graphs. Computing 882 Simon Fraser University Spring 2002. Applications of Bayes Nets (I). Windows Office “Paper Clip.” Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” UBC Intelligent Tutoring System (ASI X-change).

hadar
Télécharger la présentation

Inferring Causal Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002

  2. Applications of Bayes Nets (I) • Windows Office “Paper Clip.” • Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” • UBC Intelligent Tutoring System (ASI X-change).

  3. Applications of Bayes Nets (II) • University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates. • Tetrad II recalibrates Mass Spectrometer on earth satellite. • Tetrad II predicts relation between corn exports and exchange rates.

  4. Bayes Nets: Basic Definitions • Defn: A and B are independent iff P(A and B) = P(A) x P(B). • Exercise: Prove that A and B are independent iff P(A|B) = P(A). • Thus independence implies irrelevance.

  5. Independence Among Variables • Let X,Y,Z be random variables. • X is independent of Y iff P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0. • X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0. • Notation: (X Y|Z). • Intuitively: given information Z, Y is irrelevant to X.

  6. Axioms for Informational Relevance • Pearl (2000), p.11. It’s possible to read the  symbol as “irrelevant”. Then we can consider a number of axioms for  as axiomatizations of relevance, for example: • Symmetry: if (X  Y|Z) then (Y  X|Z). • Decomposition: if (X  YW|Z) then (X  Y|Z).

  7. Markovian Parents • In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables. • Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj.

  8. Markovian Parents and Bayes Nets • Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes. • Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering. • Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).

  9. Independence in Bayes Nets • Note how useful irrelevance information is – think of a Prolog-style logical database. • A typical problem: Given some information Z, and a query about X, is Y relevant to X? • For Bayes nets, the d-separation criterion is a powerful answer.

  10. d-separation • In principle, information can flow along any path between two variables X and Y. • Provisos: A path is blocked by any collider. • Conditioning on a node reverses its status. • Conditioning on non-collider makes it block. • Conditioning on collider or its descendant makes it unblocked.

  11. d-separation characterizes independence • If X,Y d-separated by Z in a DAG G, then (X Y|Z) in all probability distributions compatible with G. • If X,Y not d-separated by Z in a DAG G, then not [(X Y|Z) in all probability distributions compatible with G.].

  12. Observational Equivalence • Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.). • How does prob constrain graph? • Two causal graphs G1,G2 are compatible with the same probs iff. • G1 has the same adjacencies as G2 and the same v-structures (basically, colliders).

  13. Observational Equivalence: Examples (I) • In sprinkler network, cannot tell whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5. • General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.

  14. Observational Equivalence: Examples (II) • Vancouver sun, March 29, 2002. “Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?” [Science, Dr. Johnson, Columbia U.]

  15. Two Models of Aggressive behaviour Disposition to aggression TV watching Violent behavour Disposition to aggression TV watching Violent behavour Are these two graphs observationally distinguishable?

  16. Minimal Graphs • A graph G is minimal for a probability distribution P iff • G is compatible with P, and • no subgraph of G is compatible with P. • Example: not minimal if A {B,C,D} A C D B

  17. Note on minimality • Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B. • In statistical tests, dependence is observable but independence is not. • So minimality amounts to “assume independence until dependence is observed”. • That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).

  18. Stable Distributions • A distribution P is stable iff there is a graph G such that (X  Y |Z) in P iff X and Y are d-separated given Z in G. • Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.

  19. Inferring Causal Structure: The IC Algorithm • Assume a stable probability distribution P. • Find a minimal graph for P with as many edges directed as possible. • General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.

  20. Inferring Causal Structure: The IC Algorithm • For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X  Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y. • For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y. • Orient edges without creating cycles or v-structures.

  21. Rules for Orientation • Given a b, b – c add b c if a,c are not linked (no new collider). • Given a c b, a – b add a  b (no cycle). • Given a – c d and c d  b and a – b add a  b if c,d are not linked (no cycle + no new collider). • Given a – c b and a – d b and a – b add a  b if c,d are not linked (no cycle + no new collider).

More Related