Bayesian Belief Network

Bayesian Belief Network AI4190

Contents • Introduction • Bayesian Network • KDD Data

Introduction • Bayesian network: A graphical model for probabilistic relationships among a set of variables. • The difference between physical probability and personal probability • Physical probability: based on the frequency, need repetition • Personal probability: a person’s degree of belief, do not need repetition.

Introduction • (Example) Flipping a thumbtack: (Q) What is P(head) after N flippings ? • physical probability of heads: an unknown physical probability is assumed to exist and estimate this from the N observations using criteria as low bias and low variance. • Bayesian probability of heads: denote  as a state of personal information. Find P(X=head |  ). -The uncertainty of parameter : p( | ) Compute p(XN = x | D, ) from priorp( | ).

Bayesian Network • Bayesian network • DAG(Directed Acyclic Graph) • Express dependence relations between variables • Can use prior knowledge on the data(parameter) A B C P(A,B,C,D,E) =P(A)P(B|A,D)P(C|B) P(D|A)P(E|C,D) D E • Examples of conjugate priors : Dirichlet for multinomial data , Normal-Wishart for normal data

Bayesian Network • Bayesian Network: Network Structure (S) + Local Probability (P). • D: Data of N cases. • Each case X has r-categories (Univariate): Use ((1), (2), .. , (r)) ~ Dir(a(1), a(2), … , a(r)) ( Dirichlet prior wrt. multinomial.) E(i) = a(i) /  a(i)

Bayesian Network The probability of observing a new case xN+1 in the k-th category: p(XN+1 = xk| D) = p(XN+1 = xk| ) p( | D) = [a(k) + Nk]/ [a + N] ( Nk : frequency of observing the k-th category in D) • For multivatiate X=(X1, X2, … Xn ). p(X) = j p(Xj | Pa (Xj)) For k-th category of Xi and j-th category of Pa(Xi) , (i,j,k) ~ Dir [a(i,j,1), a(i,j 2), … ,a(i,j,r(i)] (Dirichlet Prior)

Bayesian Network N(i,j,k) : frequency of Xj = xk under j-th category of Pa(Xi) in data D. Xi ~ MN [ (i,j,1), (i,j,2), … (i,j,r(i)) ] Corresponding posterior (i,j,k): (i,j,k) ~ Dir [a(i,j,1)+N(i,j,1), a(i,j 2)+N(I,j,2), … , a(i,j,r(i)+N(i,j,r(i))] . • BDe Score: For the calculation of marginal likelihood or evidence. p(D | S) = [() /(+N)]  [(k) /(k +Nk) ] ( p(D | S) =  p(D |  , S) p( | S) d )

Bayesian Network • Methods of searching: Greedy, Reverse, Exhaustive ( The prior order or structure of nodes are given.) • For missing values: -Gibbs sampling -Gaussian Approximation -EM -Bound and Collaps etc.

Bayesian Network • Interpretations: - depends on the prior order of nodes or prior structure. • local conditional probability • choise of nodes • the overall nature of data

KDD Data • KDD Data • Data: 465 features over 1700 customers • Features include friend promotion rate, date visited, weight of items, price of house, discount rate, … • Data was collected during Jan. 30 – March 30 2000 • Friend promotion was started from 29 Feb. with TV advertisement. • Aims: Description of heavy/low spender

KDD Data • Features selected by various ways

KDD Data • A Bayesian net of KDD data • V229 (Order-Average), V240 (Friend) influence directly V312 (Target) • V19 (Date) was influenced by V240 (Friend) reflecting the TV advertisement.

Bayesian Belief Network