Download
a strategy for making predictions under manipulation n.
Skip this Video
Loading SlideShow in 5 Seconds..
A Strategy for Making Predictions under Manipulation PowerPoint Presentation
Download Presentation
A Strategy for Making Predictions under Manipulation

A Strategy for Making Predictions under Manipulation

78 Views Download Presentation
Download Presentation

A Strategy for Making Predictions under Manipulation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. A Strategy for Making Predictions under Manipulation Ioannis Tsamardinos Assistant Professor Computer Science Department, University of Crete ICS, Foundation for Research and Technology - Hellas Laura E. Brown Ph.D. Candidate Dept. Biomedical Inf., Vanderbilt Univ.

  2. Selecting a Formulation of Causality V2 • Causal Bayesian Networks • Cross Sectional Data • No explicit notion of time • No feedback cycles allows • Edges express causal relations • Distribution expressed as V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  3. Effect of Manipulation V2 V1 V3 T V4 V5 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete

  4. Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 External Manipulator V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete

  5. Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 Other parents are removed V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete

  6. Effect of Manipulation V2 E V1 V3 M the set of manipulated variables T V4 V5 V6 J Pearl. Causality, Models, Reasoning, and Inference, 2000. I. Tsamardinos, CSD, University of Crete

  7. Types of Predictive Tasks • No manipulations • Known set of manipulated variables M • From data following P(V) • Predict data following PM(V) • The way manipulations are performed is unknown, i.e. PM(Vi | E) are uknown • Unknown M I. Tsamardinos, CSD, University of Crete

  8. The Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  9. The Manipulated Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects in the manipulated distribution • E.g. V1 and V5 V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  10. Properties of MB(T) • The smallest-size, most-predictive subset of variables • All and only the variables we need for building optimal predictive models I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, Filters and Wrappers. AI & Statistics, 2003. I. Tsamardinos, CSD, University of Crete

  11. A. No Manipulations • Find the MB(T) • Fit a model from training data for P(T | MBM(T)), using only the the variables of the MB(T) I. Tsamardinos, CSD, University of Crete

  12. B. Known M • Find the MBM(T) • Fit a model from training data, using only the variables of the MBM(T) • Proposition: PM(T | MBM(T)) = P(T | MBM(T)) provided there are no manipulated spouses of T that is a descendant of T in the unmanipulated distribution I. Tsamardinos, CSD, University of Crete

  13. Can Be Fit From Unmanipulated Data V2 • M = {V1 , V5} • PM(T | MBM(T)) = P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  14. Cannot Be Fit From Unmanipulated Data V2 • M = {V1, V4 } • PM(T | MBM(T))  P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  15. Unknown Manipulations M • Find the direct causes of T • Fit a model from training data, using only the the variables that are direct causes of T • Only the direct causes remain in MBM(T) under any manipulation I. Tsamardinos, CSD, University of Crete

  16. Learning Bayesian Networks • Many algorithms that can learn the network exist • Discrete data : MMHC1 • Mixed: Bach2 • Find the graph, find the MBM(T), fit a model and you are done • … or are you? 1. I Tsamardinos, LE Brown, and CF Aliferis. Machine Learning, 65(1):31, 2006. 2. F.R. Bach and M.I. Jordan. NIPS-02 I. Tsamardinos, CSD, University of Crete

  17. Faithfulness and Parity Functions • All BN methods assume Faithfulness • Causes and effects have detectable conditional pairwise associations with T • T = V1XOR V3 • No pairwise association between T and V1 V1 V3 T I. Tsamardinos, CSD, University of Crete

  18. Parity Functions in Feature Space V1 V2 • T = V1XOR V2 • No pairwise association T, V1 • Construct New Feature • V1 V2 • Pairwise associations become apparent T V1 V2 V1V2 T I. Tsamardinos, CSD, University of Crete

  19. Feature Space Markov Blanket • Map Data to Feature Space • Learn the Markov Blanket in Feature Space I. Tsamardinos, CSD, University of Crete

  20. Feature Space Markov Blanket • Map Data to Feature Space • Brute force is inefficient • Indirectly map to feature space using an SVM • Assume: low SVM weight of a feature implies low association of the feature with T • Produce only the top weighted features! (recently developed heuristic method) • Learn the Markov Blanket in Feature Space • Run HITON1 1. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. AMIA 2003. I. Tsamardinos, CSD, University of Crete

  21. Inducting the MB(T) • Run MMMB1, RFE2, FSMB3, no feature selection • Build predictive models • If there is a large discrepancy in predicting performance consult FSMB • If there are “parity”-like variables, add the corresponding constructed features in the data before learning the network • I Tsamardinos, CF Aliferis, and A Statnikov. KDD 2003. • I. Guyon, et. al. Machine Learning, 46(1-3):389{422}, 2002. • submitted for publication I. Tsamardinos, CSD, University of Crete

  22. Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Marginal MB(T) showed in green H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  23. Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Redish edges are “removed” by manipulations Manipulations of V5 , V3lead to errors in estimating MBM(T) (bluish nodes) H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete

  24. Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) •  Z1 O, s.t. I(V1 ; V3 | S) •  Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1  {T}) • I(V1 ; V5 | Z2  {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 V5 I. Tsamardinos, CSD, University of Crete

  25. Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) •  Z1 O, s.t. I(V1 ; V3 | S) •  Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1  {T}) • I(V1 ; V5 | Z2  {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 H V5 I. Tsamardinos, CSD, University of Crete

  26. Finding Non-Confounded Edges • Use to test to • Orient some edges • Find truly causal (non-confounded) edges • Extension of basic idea presented in [1] 1. S. Mani, P. Spirtes, and G.F. Cooper. UAI 2006. I. Tsamardinos, CSD, University of Crete

  27. Finding the MBM(T) • Edge existence: BN learning algorithm • Edge orientation: • Learn the network, convert to PDAG, obtain compelled edges • Confounding test • Edge confounding • Confounding test • Weigh evidence and decide on orientation and absence of confounding I. Tsamardinos, CSD, University of Crete

  28. Finding the MBM(T) V2 Non-confounded Oriented but could be confounded Undirected Manipulated Nodes V1 V3 V7 T Vi V4 V5 Are V7 , V3part of MBM(T)? Is V4 part of MBM(T)? V6 I. Tsamardinos, CSD, University of Crete

  29. Results I. Tsamardinos, CSD, University of Crete

  30. Limitations • Most time spent or REGED • Conditional independence tests were sometimes inappropriate • New methods not optimized or fully tested • Model averaging should be used • Formal methods for weighing the evidence are needed I. Tsamardinos, CSD, University of Crete

  31. Conclusions • General basis of theory and algorithms for predictions under manipulation • New algorithms for addressing lack of faithfulness and hidden confounding variables • The strategy can be implemented using the new and existing algorithms • Many open directions/problems • Faithfulness • Acyclicity • Hidden variables • Timed data I. Tsamardinos, CSD, University of Crete