Bayesian Belief Network AI4190
Contents • Introduction • Bayesian Network • KDD Data
Introduction • Bayesian network: A graphical model for probabilistic relationships among a set of variables. • The difference between physical probability and personal probability • Physical probability: based on the frequency, need repetition • Personal probability: a person’s degree of belief, do not need repetition.
Introduction • (Example) Flipping a thumbtack: (Q) What is P(head) after N flippings ? • physical probability of heads: an unknown physical probability is assumed to exist and estimate this from the N observations using criteria as low bias and low variance. • Bayesian probability of heads: denote as a state of personal information. Find P(X=head | ). -The uncertainty of parameter : p( | ) Compute p(XN = x | D, ) from priorp( | ).
Introduction p(XN = x | D, ) = E [ ] wrt. posteriorp( | D, ) . Bayes Rule: p( | D, ) = p( | ) p( D | , ) / p(D | ) = p( | ) h (1 - )t / p(D | ) (where p(D | ) = p(D | , ) p( | ) d ) • For a beta prior Beta( | a(h), a(t) ), the posterior is also Beta( | a(h)+h, a(t)+t ) (Conjugate wrt. Binomial). Ex) p(XN = x | D, ) = E [ ] = a(h)/[a(h)+a(t)] (for prior) = [a(h)+h]/[a(h)+a(t)+N] (for posterior)
Bayesian Network • Bayesian network • DAG(Directed Acyclic Graph) • Express dependence relations between variables • Can use prior knowledge on the data(parameter) A B C P(A,B,C,D,E) =P(A)P(B|A,D)P(C|B) P(D|A)P(E|C,D) D E • Examples of conjugate priors : Dirichlet for multinomial data , Normal-Wishart for normal data
Bayesian Network • Bayesian Network: Network Structure (S) + Local Probability (P). • D: Data of N cases. • Each case X has r-categories (Univariate): Use ((1), (2), .. , (r)) ~ Dir(a(1), a(2), … , a(r)) ( Dirichlet prior wrt. multinomial.) E(i) = a(i) / a(i)
Bayesian Network The probability of observing a new case xN+1 in the k-th category: p(XN+1 = xk| D) = p(XN+1 = xk| ) p( | D) = [a(k) + Nk]/ [a + N] ( Nk : frequency of observing the k-th category in D) • For multivatiate X=(X1, X2, … Xn ). p(X) = j p(Xj | Pa (Xj)) For k-th category of Xi and j-th category of Pa(Xi) , (i,j,k) ~ Dir [a(i,j,1), a(i,j 2), … ,a(i,j,r(i)] (Dirichlet Prior)
Bayesian Network N(i,j,k) : frequency of Xj = xk under j-th category of Pa(Xi) in data D. Xi ~ MN [ (i,j,1), (i,j,2), … (i,j,r(i)) ] Corresponding posterior (i,j,k): (i,j,k) ~ Dir [a(i,j,1)+N(i,j,1), a(i,j 2)+N(I,j,2), … , a(i,j,r(i)+N(i,j,r(i))] . • BDe Score: For the calculation of marginal likelihood or evidence. p(D | S) = [() /(+N)] [(k) /(k +Nk) ] ( p(D | S) = p(D | , S) p( | S) d )
Bayesian Network • Methods of searching: Greedy, Reverse, Exhaustive ( The prior order or structure of nodes are given.) • For missing values: -Gibbs sampling -Gaussian Approximation -EM -Bound and Collaps etc.
Bayesian Network • Interpretations: - depends on the prior order of nodes or prior structure. • local conditional probability • choise of nodes • the overall nature of data
KDD Data • KDD Data • Data: 465 features over 1700 customers • Features include friend promotion rate, date visited, weight of items, price of house, discount rate, … • Data was collected during Jan. 30 – March 30 2000 • Friend promotion was started from 29 Feb. with TV advertisement. • Aims: Description of heavy/low spender
KDD Data • Features selected by various ways
KDD Data • A Bayesian net of KDD data • V229 (Order-Average), V240 (Friend) influence directly V312 (Target) • V19 (Date) was influenced by V240 (Friend) reflecting the TV advertisement.