Conservative Updating with Incomplete Observations in Bayesian Networks

IDSIA Updating with incomplete observations(UAI-2003) Gert de Cooman Marco Zaffalon “Dalle Molle” Institute for Artificial Intelligence SWITZERLAND http://www.idsia.ch/~zaffalon zaffalon@idsia.ch SYSTeMS research group BELGIUM http://ippserv.ugent.be/~gert gert.decooman@ugent.be

What are incomplete observations?A simple example • C (class) and A (attribute) are Boolean random variables • C = 1 is the presence of a disease • A = 1 is the positive result of a medical test • Let us do diagnosis • Good point: you know that • p(C = 0, A = 0) = 0.99 • p(C = 1, A = 1) = 0.01 • Whence p(C = 0 | A = a) allows you to make a sure diagnosis • Bad point: the test result can be missing • This is an incomplete, or set-valued, observation {0,1} for A What is p(C = 0 | A is missing)?

Example ctd • Kolmogorov’s definition of conditional probability seems to say • p(C = 0 | A  {0,1}) = p(C = 0) = 0.99 • i.e., with high probability the patient is healthy • Is this right? • In general, it is not • Why?

Why? • Because A can be selectively reported • e.g., the medical test machine is broken;it produces an output  the test is negative (A = 0) • In this case p(C = 0 | A is missing) = p(C = 0 | A = 1) = 0 • The patient is definitely ill! • Compare this with the former naive application ofKolmogorov’s updating (or naive updating, for short)

(c,a) o IM p(C,A) Incompleteness Mechanism (IM) Actual observation (o) about A Distribution generating pairs for (C,A) Complete pair (not observed) Modeling it the right way • Observations-generating model • o is a generic value for O, another random variable • o can be 0, 1, or * (i.e., missing value for A) • IM = p(O | C,A) should not be neglected! The correct overall model we need is p(C,A)p(O | C,A)

(V)isit to Asia (S)moking = y (T)uberculosis = n Lung (C)ancer? Bronc(H)itis Abnorma(L) X-rays = y (D)yspnea What about Bayesian nets (BNs)? • Asia net • Let us predict C on the basis of the observation (L,S,T) = (y,y,n) • BN updating instructs us to use p(C | L = y,S = y,T = n) to predict C

Asia ctd • Should we really use p(C | L = y,S = y,T = n) to predict C? (V,H,D) is missing (L,S,T,V,H,D) = (y,y,n,*,*,*) is an incomplete observation • p(C | L = y,S = y,T = n) is just the naive updating • By using the naive updating, we are neglecting the IM! Wrong inference in general

New problem? • Problems with naive updating were already clear since 1985 at least (Shafer) • Practical consequences were not so clear • How often does naive updating make problems? • Perhaps it is not a problem in practice?

Grünwald & Halpern (UAI-2002) on naive updating • Three points made strongly • naive updating works  CAR holds • i.e., neglecting the IM is correct  CAR holds • With missing data:CAR (coarsening at random) = MAR (missing at random) =p(A is missing | c,a) is the same for all pairs (c,a) • CAR holds rather infrequently • The IM, p(O | C,A), can be difficult to model 2 & 3 = serious theoretical & practical problem How should we do updating given 2 & 3?

What this paper is about • Have a conservative (i.e., robust) point of view • Deliberately worst case, as opposed to the MAR best case • Assume little knowledge about the IM • You are not allowed to assume MAR • You are not able/willing to model the IM explicitly • Derive an updating rule for this important case • Conservative updating rule

(c,a) o IM p(C,A) Unknown Incompleteness Mechanism Actual observation (o) about A Known prior distribution Complete pair (not observed) 1st step: plug ignorance into your model • Fact: the IM is unknown • p(O{0,1,*} | C,A) = 1 • a constraint on p(O | C,A) • i.e. any distribution p(O | C,A) is possible • This is too conservative;to draw useful conclusionswe need a little less ignorance • Consider the set of all p(O | C,A) s.t. p(O | C,A) = p(O | A) • i.e., all the IMs which do not depend on what you want to predict • Use this set of IMs jointly with prior information p(C,A)

2nd step: derive the conservative updating • Let E = evidence = observed variables, in state e • Let R = remaining unobserved variables (except C) • Formal derivation yields: • All the values for R should be considered • In particular, updating becomes: Conservative Updating Rule(CUR) minrRp(c | E = e,R = r) p(c | o)  maxrRp(c | E = e,R = r)

(V)isit to Asia (S)moking = y (T)uberculosis = n Lung (C)ancer? Bronc(H)itis Abnorma(L) X-rays = y (D)yspnea CUR & Bayesian nets • Evidence: (L,S,T) = (y,y,n) • What is your posterior confidence on C = y? • Consider all the jointvalues of nodes in RTake min & max of p(C = y | L = y,S = y,T = n,v,h,d) Posterior confidence  [0.42,0.71] • Computational note: only Markov blanket matters!

A few remarks • The CUR… • is based only on p(C,A), like the naive updating • produces lower & upper probabilities • can produce indecision

CUR & decision-making • Decisions • c’ dominates c’’ (c’,c’’ C) if for all r R , p(c’ | E = e, R = r) > p(c’’ | E = e, R = r) • Indecision? • It may happen that r’,r’’ R so that: p(c’ | E = e, R = r’) > p(c’’ | E = e, R = r’) and p(c’ | E = e, R = r’’) < p(c’’ | E = e, R = r’’) There is no evidence that you should prefer c’ to c’’ and vice versa (= keep both)

(V)isit to Asia (S)moking = y (T)uberculosis Lung (C)ancer? Bronc(H)itis Abnorma(L) X-rays = y (D)yspnea Decision-making example • Evidence: E = (L,S,T) = (y,y,n) = e • What is your diagnosis for C? • p(C = y | E = e, H = n, D = y) > p(C = n | E = e, H = n, D = y) • p(C = y | E = e, H = n, D = n) < p(C = n | E = e, H = n, D = n) • Both C = y and C = n are plausible • Evidence:E = (L,S,T) = (y,y,y) = e • C = n dominates C = y: “cancer” is ruled out

Algorithmic facts • CUR  restrict attention to Markov blanket • State enumeration still prohibitive in some cases • e.g., naive Bayes • Dominance test based on dynamic programming • Linear in the number of children of class node C However: decision-making possible in linear time, by provided algorithm, even on some multiply connected nets!

On the application side • Important characteristics of present approach • Robust approach, easy to implement • Does not require changes in pre-existing BN knowledge bases • based on p(C,A) only! • Markov blanket  favors low computational complexity • If you can write down the IM explicitly, your decisions/inferences will be contained in ours • By-product for large networks • Even when naive updating is OK, CUR can serve as a useful preprocessing phase • Restricting attention to Markov blanket may produce strong enough inferences and decisions

What we did in the paper • Theory of coherent lower previsions (imprecise probabilities) • Coherence • Equivalent to a large extent to sets of probability distributions • Weaker assumptions • CUR derived in quite a general framework

Concluding notes • There are cases when: • IM is unknown/difficult to model • MAR does not hold • Serious theoretical and practical problem • CUR applies • Robust to the unknown IM • Computationally easy decision-making with BNs • CUR works with credal nets, too • Same complexity • Future: how to make stronger inferences and decisions • Hybrid MAR/non-MAR modeling?

Conservative Updating with Incomplete Observations in Bayesian Networks

Conservative Updating with Incomplete Observations in Bayesian Networks

Presentation Transcript

THE NEW SECTION SSV-UAI-GRAV

Updating ISCO-08

Chile: UAI

Venus Transit in Italy

THE NEW SECTION SSV-UAI-GRAV

Data Mining with UAI Proceedings

Observations of Moreton waves with Solar-B

Module Networks

A password authentication scheme with secure password updating

What if I don’t get the UAI to get into…

Forecast Sensitivity Tests with New Observations

Solar flare observations with INTEGRAL/SPI

WORKING MEMORY UPDATING IN ELDERLY WITH MILD COGNITIVE IMPAIRMENT

Mining Event Periodicity from Incomplete Observations

CONSTRAINING GALACTIC INFALL WITH DEUTERIUM OBSERVATIONS

Planning with Incomplete, Unbounded Information

OH observations with ALFA

Fusion-Incomplete Fusion

Fusion-Incomplete Fusion