310 likes | 417 Vues
This work discusses a novel strategy for making predictions in environments where data manipulation is present. By employing Causal Bayesian Networks, it highlights the significance of understanding causal relationships even in the absence of explicit temporal references or feedback cycles. The paper elaborates on the construction of the manipulated Markov Blanket, revealing how to identify the most predictive variables. This framework serves as a foundation for effective predictive modeling in biomedical informatics and other fields affected by external manipulations.
E N D
A Strategy for Making Predictions under Manipulation Ioannis Tsamardinos Assistant Professor Computer Science Department, University of Crete ICS, Foundation for Research and Technology - Hellas Laura E. Brown Ph.D. Candidate Dept. Biomedical Inf., Vanderbilt Univ.
Selecting a Formulation of Causality V2 • Causal Bayesian Networks • Cross Sectional Data • No explicit notion of time • No feedback cycles allows • Edges express causal relations • Distribution expressed as V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 V1 V3 T V4 V5 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 External Manipulator V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 V2 E V1 V3 V1 V3 T T V4 V4 Other parents are removed V5 V5 V6 V6 Manipulate V1 , V5 I. Tsamardinos, CSD, University of Crete
Effect of Manipulation V2 E V1 V3 M the set of manipulated variables T V4 V5 V6 J Pearl. Causality, Models, Reasoning, and Inference, 2000. I. Tsamardinos, CSD, University of Crete
Types of Predictive Tasks • No manipulations • Known set of manipulated variables M • From data following P(V) • Predict data following PM(V) • The way manipulations are performed is unknown, i.e. PM(Vi | E) are uknown • Unknown M I. Tsamardinos, CSD, University of Crete
The Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
The Manipulated Markov Blanket of T V2 • The set of direct causes, direct effects, and direct causes of direct effects in the manipulated distribution • E.g. V1 and V5 V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Properties of MB(T) • The smallest-size, most-predictive subset of variables • All and only the variables we need for building optimal predictive models I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, Filters and Wrappers. AI & Statistics, 2003. I. Tsamardinos, CSD, University of Crete
A. No Manipulations • Find the MB(T) • Fit a model from training data for P(T | MBM(T)), using only the the variables of the MB(T) I. Tsamardinos, CSD, University of Crete
B. Known M • Find the MBM(T) • Fit a model from training data, using only the variables of the MBM(T) • Proposition: PM(T | MBM(T)) = P(T | MBM(T)) provided there are no manipulated spouses of T that is a descendant of T in the unmanipulated distribution I. Tsamardinos, CSD, University of Crete
Can Be Fit From Unmanipulated Data V2 • M = {V1 , V5} • PM(T | MBM(T)) = P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Cannot Be Fit From Unmanipulated Data V2 • M = {V1, V4 } • PM(T | MBM(T)) P(T | MBM(T)) V1 V3 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Unknown Manipulations M • Find the direct causes of T • Fit a model from training data, using only the the variables that are direct causes of T • Only the direct causes remain in MBM(T) under any manipulation I. Tsamardinos, CSD, University of Crete
Learning Bayesian Networks • Many algorithms that can learn the network exist • Discrete data : MMHC1 • Mixed: Bach2 • Find the graph, find the MBM(T), fit a model and you are done • … or are you? 1. I Tsamardinos, LE Brown, and CF Aliferis. Machine Learning, 65(1):31, 2006. 2. F.R. Bach and M.I. Jordan. NIPS-02 I. Tsamardinos, CSD, University of Crete
Faithfulness and Parity Functions • All BN methods assume Faithfulness • Causes and effects have detectable conditional pairwise associations with T • T = V1XOR V3 • No pairwise association between T and V1 V1 V3 T I. Tsamardinos, CSD, University of Crete
Parity Functions in Feature Space V1 V2 • T = V1XOR V2 • No pairwise association T, V1 • Construct New Feature • V1 V2 • Pairwise associations become apparent T V1 V2 V1V2 T I. Tsamardinos, CSD, University of Crete
Feature Space Markov Blanket • Map Data to Feature Space • Learn the Markov Blanket in Feature Space I. Tsamardinos, CSD, University of Crete
Feature Space Markov Blanket • Map Data to Feature Space • Brute force is inefficient • Indirectly map to feature space using an SVM • Assume: low SVM weight of a feature implies low association of the feature with T • Produce only the top weighted features! (recently developed heuristic method) • Learn the Markov Blanket in Feature Space • Run HITON1 1. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. AMIA 2003. I. Tsamardinos, CSD, University of Crete
Inducting the MB(T) • Run MMMB1, RFE2, FSMB3, no feature selection • Build predictive models • If there is a large discrepancy in predicting performance consult FSMB • If there are “parity”-like variables, add the corresponding constructed features in the data before learning the network • I Tsamardinos, CF Aliferis, and A Statnikov. KDD 2003. • I. Guyon, et. al. Machine Learning, 46(1-3):389{422}, 2002. • submitted for publication I. Tsamardinos, CSD, University of Crete
Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Marginal MB(T) showed in green H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Hidden Variables and Confounding V2 V1 V3 H1 H1 , H2hidden variables Dashed edges appear in the marginal network Redish edges are “removed” by manipulations Manipulations of V5 , V3lead to errors in estimating MBM(T) (bluish nodes) H2 T V4 V5 V6 I. Tsamardinos, CSD, University of Crete
Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) • Z1 O, s.t. I(V1 ; V3 | S) • Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1 {T}) • I(V1 ; V5 | Z2 {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 V5 I. Tsamardinos, CSD, University of Crete
Finding Non-Confounded Edges Proposition: V = O H, O are observable, H are not. P(V) is faithful to a Causal Bayesian Network . If • S O, I(V1 ; T | S) • S O, I(V3 ; T | S) • S O, I(V5 ;T | S) • Z1 O, s.t. I(V1 ; V3 | S) • Z2 O, s.t. I(V1 ; V5 | S) • I(V1 ; V3 | Z1 {T}) • I(V1 ; V5 | Z2 {T}) Then there is a causal path T to V5 (edge T V5 is causal) V2 V1 V3 T V6 H V5 I. Tsamardinos, CSD, University of Crete
Finding Non-Confounded Edges • Use to test to • Orient some edges • Find truly causal (non-confounded) edges • Extension of basic idea presented in [1] 1. S. Mani, P. Spirtes, and G.F. Cooper. UAI 2006. I. Tsamardinos, CSD, University of Crete
Finding the MBM(T) • Edge existence: BN learning algorithm • Edge orientation: • Learn the network, convert to PDAG, obtain compelled edges • Confounding test • Edge confounding • Confounding test • Weigh evidence and decide on orientation and absence of confounding I. Tsamardinos, CSD, University of Crete
Finding the MBM(T) V2 Non-confounded Oriented but could be confounded Undirected Manipulated Nodes V1 V3 V7 T Vi V4 V5 Are V7 , V3part of MBM(T)? Is V4 part of MBM(T)? V6 I. Tsamardinos, CSD, University of Crete
Results I. Tsamardinos, CSD, University of Crete
Limitations • Most time spent or REGED • Conditional independence tests were sometimes inappropriate • New methods not optimized or fully tested • Model averaging should be used • Formal methods for weighing the evidence are needed I. Tsamardinos, CSD, University of Crete
Conclusions • General basis of theory and algorithms for predictions under manipulation • New algorithms for addressing lack of faithfulness and hidden confounding variables • The strategy can be implemented using the new and existing algorithms • Many open directions/problems • Faithfulness • Acyclicity • Hidden variables • Timed data I. Tsamardinos, CSD, University of Crete