1 / 84

ASSESSING CAUSAL QUANTITIES FROM EXPERIMENTAL AND NONEXPERIMENTAL DATA

ASSESSING CAUSAL QUANTITIES FROM EXPERIMENTAL AND NONEXPERIMENTAL DATA. Judea Pearl Computer Science and Statistics UCLA www.cs.ucla.edu/~judea/. OUTLINE. From probability to causal analysis: The differences The barriers The benefits. Assessing the effects of actions and policies.

naida-oneil
Télécharger la présentation

ASSESSING CAUSAL QUANTITIES FROM EXPERIMENTAL AND NONEXPERIMENTAL DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ASSESSING CAUSAL QUANTITIES FROM EXPERIMENTAL AND NONEXPERIMENTAL DATA Judea Pearl Computer Science and Statistics UCLA www.cs.ucla.edu/~judea/

  2. OUTLINE • From probability to causal analysis: • The differences • The barriers • The benefits • Assessing the effects of actions and policies • Determining the causes of effects • Distinguishing direct from indirect effects • Generating explanations

  3. FROM PROBABILITY TO CAUSAL ANALYSIS: 1. THE DIFFERENCES Probability and statistics deals with static relations Statistics Probability predictions from passive observations joint distribution Data Causal analysis deals with changes (dynamics) • Effects of • interventions Data Causal Model • Causes of • effects Causal assumptions • Explanations Experiments

  4. FROM PROBABILITY TO CAUSAL ANALYSIS: 1. THE DIFFERENCES (CONT) • Causal and statistical concepts do not mix. CAUSAL Spurious correlation Randomization Confounding / Effect Instrument Holding constant Explanatory variables STATISTICAL Regression Association / Independence “Controlling for” / Conditioning Odd and risk ratios Collapsibility • No causes in – no causes out (Cartwright, 1989) } statistical assumptions data causal assumptions  causal conclusions • Causal assumptions cannot be expressed in the mathematical language of standard statistics. • Non-standard mathematics: • Structural equation models (SEM) • Counterfactuals (Neyman-Rubin) • Causal Diagrams (Wright, 1920)

  5. FROM PROBABILITY TO CAUSAL ANALYSIS: 2. THE MENTAL BARRIERS • Reliance on non-probabilistic assumptions • (all conclusions are conditional). • The use of new mathematical languages. • 2.1 Structural equations models (SEM) • 2.2 Counterfactuals (Neyman-Rubin framework) • 2.3 Causal diagrams (Wright 1920)

  6. WHAT'SIN A CAUSAL MODEL? Oracle that assigns truth value to causal sentences: Action sentences:B if wedoA. Counterfactuals:B would be different if Awere true. Explanation:B occurredbecauseof A. Optional:with whatprobability?

  7. CAUSAL MODELS WHY THEY ARE NEEDED X Y Z INPUT OUTPUT

  8. STRUCTURAL EQUATION MODELS (HAAVELMO 1943; DUNCAN 1970) I U1 W U2 Q P CAUSAL MODEL: PATH DIAGRAMS (WRIGHT 1920): INCOME WAGE DEMAND PRICE

  9. CAUSAL MODELS AND COUNTERFACTUALS Definition: A causal model is a 3-tuple M = V,U,F with a multilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U (iv) Mx= U,V,Fx, X  V, x  X where Fx = {fi: Vi X }  {X = x} (Replace all functions ficorresponding to X with the constant functions X=x) Definition (Probabilistic Causal Model): M, P(u) P(u) is a probability assignment to the variables in U.

  10. Qualitative assumptions about absence of effects Genetic factors weight A B C Blood pressure Cardio vascular disease E D COMMON QUESTIONS ON CAUSAL DIAGRAMS • What do causal diagrams represent? • Where do these assumptions come from? • Commonsense, substantive knowledge, other studies • What if we can’t make any causal assumptions? • Quit

  11. CAUSAL INFERENCE MADE EASY (1985-2000) • Inference with Nonparametric Structural Equations • made possible through Graphical Analysis. • Mathematical underpinning of counterfactuals • through nonparametric structural equations • Graphical-Counterfactuals symbiosis

  12. NON-PARAMETRIC STRUCTURAL MODELS Given P(x,y,z), should we ban smoking? U U 1 1 U U 3 3 U U 2 2 f3 f1 f2   X Y Z X Y Z Smoking Tar in Lungs Cancer Smoking Tar in Lungs Cancer Linear Analysis Nonparametric Analysis x = u1, z = x + u2, y = z +  u1 + u3. x = f1(u1), z = f2(x, u2), y = f3(z, u1, u3). Find:    Find: P(y|do(x))

  13. NON-PARAMETRIC STRUCTURAL MODELS Given P(x,y,z), should we ban smoking? U U 1 1 U U 3 3 U U 2 2 f3 f2   X = x Y Z X Y Z Smoking Tar in Lungs Cancer Smoking Tar in Lungs Cancer Linear Analysis Nonparametric Analysis x = u1, z = x + u2, y = z +  u1 + u3. x = const. z = f2(x, u2), y = f3(z, u1, u3).  Find:    Find: P(y|do(x)) = P(Y=y) in new model

  14. CAUSAL MODELS AND CAUSAL EFFECTS Definition: A causal model is a 3-tuple M = V,U,F with a multilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui)PAi V \ ViUi U (iv) Mx= U,V,Fx, X  V, x  X where Fx = {fi: Vi X }  {X = x} (Replace all functions ficorresponding to X with the constant functions X=x)

  15. CAUSAL MODELS AND CAUSAL EFFECTS Definition: A causal model is a 3-tuple M = V,U,F with a multilation operator do(x): MMx where: (i) V = {V1…,Vn} endogenous variables, (ii) U = {U1,…,Um} background variables (iii) F = set of n functions, fi : V \ ViU Vi vi = fi(pai,ui) PAi V \ ViUi U (iv) Mx= U,V,Fx, X  V, x  X where Fx = {fi: Vi X }  {X = x} (Replace all functions ficorresponding to X with the constant functions X=x) Definition (Causal Effect P(y|do(x))): The Causal Effect of X on Y is given by the probability of Y=y in submodel Mx, P(u).

  16. IDENTIFIABILITY Definition: Let Q(M) be any quantity defined on a causal model M, andlet A be a set of assumption. Q is identifiable relative to A iff PM1(v) = PM2(v)ÞQ(M1) = Q(M2) for all M1, M2, that satisfy A. In other words, Q can be determined uniquely from the probability distribution P(v) of the endogenous variables, V, and assumptions A.

  17. Corollary: (Truncated factorization, Manipulation Theorem) The distribution generated by an intervention do(X=x) (in a Markovian model M) is given by the truncated factorization THE FUNDAMENTAL THEOREM OF CAUSAL INFERENCE Causal Markov Theorem: Any distribution generated by Markovian structural model M (recursive, with independent disturbances) can be factorized as Where pai are the (values of) the parents of Viin the causal diagram associated with M.

  18. Given P(x,y,z),should we ban smoking? U (unobserved) U (unobserved) X = x Y Z X Y Z Smoking Tar in Lungs Cancer Smoking Tar in Lungs Cancer Pre-intervention Post-intervention RAMIFICATIONS OF THE FUNDAMENTAL THEOREM To compute P(y,z|do(x)), wemust eliminate u.

  19. G Gx THE BACK-DOOR CRITERION Graphical test of identification P(y | do(x)) is identifiable in G if there is a set Z of variables such that Zd-separates X from Y in Gx. Z1 Z1 Z2 Z2 Z Z3 Z3 Z4 Z5 Z5 Z4 X X Z6 Y Y Z6 Moreover, P(y | do(x)) = åP(y | x,z) P(z) (“adjusting” for Z) z

  20. RULES OF CAUSAL CALCULUS • Rule 1:Ignoring observations • P(y |do{x},z, w) = P(y | do{x},w) • Rule 2:Action/observation exchange • P(y |do{x}, do{z}, w) = P(y|do{x},z,w) • Rule 3: Ignoring actions • P(y |do{x},do{z},w) = P(y|do{x},w) ^ ^ if ( Y Z | X , W ) G X Z

  21. DERIVATION IN CAUSAL CALCULUS Genotype (Unobserved) Smoking Tar Cancer Probability Axioms P (c |do{s})=tP (c |do{s},t) P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t |do{s}) Rule 2 = tP (c |do{s},do{t})P (t | s) Rule 3 = tP (c |do{t})P (t | s) Probability Axioms = stP (c |do{t},s) P (s|do{t})P(t |s) Rule 2 = stP (c | t, s) P (s|do{t})P(t |s) Rule 3 = stP (c | t, s) P (s) P(t |s)

  22. The meaning of b is • The meaning of  is THE INTERPRETATION OF STRUCTURAL MODELS • The meaning of y = a + bx +  is Meaning of SEM not expressible in standard probability syntax.

  23. OUTLINE • From probability to causal analysis: • The differences • The barriers • The benefits • Assessing the effects of actions and policies • Determining the causes of effects

  24. DETERMINING THE CAUSES OF EFFECTS (The Attribution Problem) • Your Honor! My client (Mr. A) died BECAUSE • he used that drug. • Court to decide if it is MORE PROBABLE THAN • NOT that A would be alive BUT FOR the drug! • P(? | A is dead, took the drug) > 0.50

  25. THE PROBLEM • Theoretical Problems: • What is the meaning of PN(x,y): • “Probability that event y would not have occurred if it were not for event x, given that x and y did in fact occur.” • Under what condition can PN(x,y) be learned from statistical data, i.e., observational, experimental and combined.

  26. COUNTERFACTUALS FROM MULTILATED MODELS Example: 2-man firing squad Would the prisoner die (Y=y) if rifleman Xwere to Shoot (X=x) in situation U=u? (Court order) U U=u (Captain) W W X=x Z Z X (Riflemen) (Death) Y Yx(u)=y

  27. 3-STEPS TO COMPUTING COUNTERFACTUALS U U TRUE TRUE C C FALSE FALSE A B A B D D TRUE TRUE S5. If the prisoner is dead, he would still be dead if A were not to have shot. DDA Abduction Action Prediction U TRUE C A B D

  28. AXIOMS OF CAUSAL COUNTERFACTUALS • Definiteness • x  X s.t. Xy (u) = x • Uniqueness • (Xy(u) = x) & (Xy(u) = x )  x=x • Effectiveness • Xxw (u) = x • Composition • Wx(u) = w  Yxw(u) = Yx(u) • Reversibility • (Yxw(u) = y) & (Wxy(u) = w)  Yx(u) = y

  29. Probabilistic Causal Model:M, P(u) • P(u) induces unique distribution P(v) PROBABILITIES OF COUNTERFACTUALS • The probability of counterfactual Yx=y is the • probability of Y=y induced by submodel Mx:

  30. COMPUTING PROBABILITIES OF COUNTERFACTUALS P(u|D) P(u|D) P(u|D) TRUE P(DA|D) P(S5). The prisoner is dead. How likely is it that he would be dead if A were not to have shot. P(DA|D) = ? Abduction Action Prediction U U U P(u) C C C FALSE FALSE A B A B A B D D D

  31. Court to decide (given both data): • Is it more probable than not that A would be alive • but for the drug? CAN FREQUENCY DATA DECIDE LEGAL RESPONSIBILITY? ExperimentalNonexperimental do(x) do(x) xx Deaths (y) 16 14 2 28 Survivals (y) 984 986 998 972 1,000 1,000 1,000 1,000 • Nonexperimental data: drug usage predicts longer life • Experimental data: drug has negligible effect on survival • Plaintiff: Mr. A is special. • He actually died • He used the drug by choice

  32. IDENTIFIABILITY FROM EXPERIMENTS Definition: Let Q(M) be any quantity defined on a causal model M, let Mexpbe a modification of M induced by some experiment, exp, and let Y be a set of variables observed under exp. Q is identifiable from experiment exp, given A, iff PM1 exp(y) = PM2 exp(y)Þ Q(M1) = Q(M2) for all M1, M2. In other words, Q can be determined uniquely from the probability distribution that the observed variables Y attain under the experimental conditions created by exp.

  33. TYPICAL RESULTS • Bounds given combined nonexperimental and • experimental data • Identifiability under exogeneity and monotonicity • Bounds under exogeneity • Identifiability under monotonicity (Combined data)

  34. BOUNDING BY LINEAR PROGRAMMING Y X Y= X U-space collapsed Y=1 Y= 0 X yxyx yxyx yxyx yxyx Y Notation:p100 = P(yx, yx , x) • Maximize (minimize): • Nonexperimental constraints: • p111 + p101= P(x,y) • p011 + p001= P(x,y) • p110 + p010= P(x ,y) • Experimental constraints: • P(yx) = p111 + p110 + p101 + p100 • P(yx) = p111 + p110 + p011 + p010

  35. TYPICAL THEOREMS (Tian and Pearl, 2000) • Identifiability under monotonicity (Combined data) • Bounds given combined nonexperimental and experimental data • corrected Excess-Risk-Ratio

  36. WITH PROBABILITY ONE P(yx | x,y) =1 SOLUTION TO THE ATTRIBUTION PROBLEM (Cont) • From population data to individual case • Combined data tell more that each study alone

  37. OUTLINE • From probability to causal analysis: • The differences • The barriers • The benefits • Assessing the effects of actions and • policies • Determining the causes of effects • Distinguishing direct from indirect effects

  38. QUESTIONS ADDRESSED • Direct / Indirect distinction is ubiquitous in our language, Why? • What is the semantics of direct and indirect effects? • Can we estimate them from data? Experimental data? • What can we do with these estimates?

  39. WHY DECOMPOSE EFFECTS? • Direct (or indirect) effect may be more transportable. • Indirect effects may be prevented or controlled. • Direct (or indirect) effect may be forbidden  Pill Pregnancy + + Thrombosis Gender Qualification Hiring

  40. TOTAL, DIRECT, AND INDIRECT EFFECTS HAVE SIMPLE SEMANTICS IN LINEAR MODELS b X Z z = bx + 1 y = ax + cz + 2 a c Y a+bc a bc

  41. SEMANTICS BECOMES NONTRIVIAL IN NONLINEAR MODELS (even when the model is completely specified) X Z z = f (x, 1) y = g (x, z, 2) Y Dependent on z? Void of operational meaning?

  42. NEED OF FORMALIZATION What is the direct effect of X on Y? X Z = AND Y Indirect Effect?

  43. DRUG ASPIRIN RECOVERY NEED OF FORMALIZATION (Who cares?) What is the direct effect of X on Y? What is the effect of drug on recovery if drug-induced headache is eliminated? X Z = AND Y Indirect Effect?

  44. DRUG ASPIRIN RECOVERY NEED OF FORMALIZATION (Who cares?) What is the direct effect of X on Y? What is the effect of drug on recovery if drug-induced headache is eliminated? X Z = AND Y

  45. GENDER QUALIFICATION HIRING NEED OF FORMALIZATION (Who cares?) indirect What is the direct effect of X on Y? The effect of Gender on Hiring if sex discrimination is eliminated. X Z IGNORE AND Y

  46. TWO CONCEPTIONS OF DIRECT AND INDIRECT EFFECTS: Controlled vs. Natural X Z = AND Y Starting from X=0, (and Z=0 and Y=0) Total Effect: Change X from 0 to 1, and test the change in Y. Controlled DE: Keep Z constant at Z=0, or Z=1, and change X=0 to X=1. Controlled IE: None. Natural DE: Keep Z constant at its current value, and change X to 1. Natural IE: Keep X at 0, but set Z to what it would be if X were 1.

  47. LEGAL DEFINITIONS TAKE THE NATURAL CONCEPTION (FORMALIZING DISCRIMINATION) ``The central question in any employment-discrimination case is whether the employer would have taken the same action had the employee been of different race (age, sex, religion, national origin etc.) and everything else had been the same’’ [Carson versus Bethlehem Steel Corp. (70 FEP Cases 921, 7th Cir. (1996))] x = male, x = female y = hire, y = not hire z = applicant’s qualifications NO DIRECT EFFECT YxZx= Yx, YxZx = Yx

  48. SEMANTICS OF NESTED COUNTERFACTUALS Can the quantity be estimated from data? Given u, Zx*(u) is the solution for Z in Mx*,call it z is the solution for Y in Mxz Given M, P(u), Q is well defined.

  49. TWO CONCEPTIONS OF AVERAGE DIRECT AND INDIRECT EFFECTS: POPULATION-LEVEL DEFINITIONS u3 u2 X Z (all other parents of Y) Probabilistic causal model: y = f (x,z,u) á ñ u1 M, P(u) Y Starting from X=x*, (and Z=Zx*(u) and Y= Yx*(u)) Total Effect: TE(x,x*;Y) = E(Yx) – E(Yx*) Controlled DE: CDEZ(x,x*;Y) = E(Yxz) – E(Yx*z) Controlled IE: None. Natural DE: NDE(x,x*;Y) = E(YxZx*) – E(Yx*) Natural IE: NIE(x,x*;Y) = E(Yx*Zx) – E(Yx*)

  50. GRAPHICAL CONDITION FOR EXPERIMENTAL IDENTIFICATION OF AVERAGE NATURAL DIRECT EFFECTS Theorem: If there exists a set W such that (YZ | W)Gxz and W  ND(X  Z) Then: NDE(x,x*;Y) = w,z [E(Yxz|w)] E(Yx* z|w)] P(Zx* =z|w)P(w) U3 U3 Example: U4 U4 X X W W Z Z U1 U1 U2 U2 Y Y G GXZ

More Related