Information Theory (10.6 ~ 10.10, 10.13 ~ 10.15) CS679 Lecture Note by Sungho Ryu

Information Theory (10.6 ~ 10.10, 10.13 ~ 10.15) CS679 Lecture Note by Sungho Ryu Computer Science Department KAIST

Index • Application of Information Theory to self-organizing systems • Case 1 : Feature Extraction • Case 2 : Spatially Coherent Features • Case 3 : Spatially Incoherent Features • Case 4 : Blind Source Separation • Summary

Case 1 : Feature Extraction • Objective Function : maximize I(Y; X)

Case 1 : Formal Statement • Infomax ( i.e. Maximum mutual information ) Principle : The transformation of ... a random vector X observed in the input layer of a neural system to a random vector Y produced in the output layer of the system should be so chosen that ... the activitiesof the neurons in theoutput layer jointly maximize information about the activities in the input layer. The objective function to be maximized is … the mutual information I(Y;X) between the vectors of X and Y

Case 1 : Examples • Single Neuron corrupted by Processing Noise Y : Gaussian random variable with variance 2Y N : Gaussian random variable with zero mean & variance 2N

Case 1 : Examples Y : Gaussian random variable with variance 2Y Ni : Gaussian random variable with zero mean & variance 2N • Single Neuron corrupted by Input Noise

Case 2 : Spatially Coherent Features • Find similarity between 2 input regions • Objective Function : maximize I(Ya ; Yb)

Case 2 : Formal Statement • First Variant of Infomax Principle : The transformation of ... a pair of vectors Xa and Xb (representing adjacent, nonoverlapping regions of an image by a neural system ) should be so chosen that ... the scalar output Ya of the system due to the input Xa maximizes information about the second scalar output Yb due to Xb. The objective function to be maximized is … the mutual information I(Ya;Yb) between the outputs Ya and Yb

Case 2 : Example ab :correlation coefficient of Ya& Yb Xa , Xb : input from adjacent, non-overlapping region of image Ya, Yb : corresponding output S : a signal component common to Ya & Yb (also Gaussian) Na, Nb : statistically independent, zero-mean additive Gaussian noise

Case 3 : Spatially Incoherent Features • Find difference between 2 input regions • Objective Function : minimize I(Ya; Yb)

Case 3 : Formal Statement • Second Variant of Infomax principle : The transformation of… a pair of input vectors Xa and Xb, representing data derived from corresponding regions in a pair of separate images by a neural system should be so chosen that ... the scalar output Ya of the system due to the input Xa minimizes information about the second scalar output Yb due to Xb. The objective function to be minimized is … the mutual information I(Ya;Yb) between the outputs Ya and Yb

Case 3 : Example • Removing clutters in polarized radar image W : overall weight matrix of the network Ya, Yb : output of network (gaussian)

Case 4 : Blind Source Separation • Estimate unknown underlying source vector U by building inverse transformation W X=AU Y=WX => Y=U iff W=A-1

Case 4:Maximum Likelihood Estimation(MLE) • Let PDF of observation vector X • Let T be a independent realization of X, Then • Convert to Normalized log-likelihood function • Let N approach infinity and apply some substitution, then we get * Maximizing the log-likelihood function L(W) is equivalent to minimizing Kullback-Leibler divergence Dfy||fu

Case 4 : Maximum Entropy Method Z=G(Y) =G(WAU) U=A-1 W-1 G-1 (Z) =(Z) • Maximizing the entropy of the random vector Z at the output of the nonlinearity G is then equivalent to W=A-1 , which yields perfect blind source separation. • if Zi is uniformly distributed inside the interval [0,1] for all i, then Maximum Entropy & MLE is equivalent * J : Jacobian J(u) of Z

Summary • Application of Information Theory to self-organizing systems • Case 1 : Feature Extraction • Infomax principle • Case 2 : Extraction of Spatially Coherent Features • 1st variant of Infomax • Case 3 : Extraction of Spatially Incoherent Features • 2nd variant of Infomax • Case 4 : Blind Source Separation • ICA vs Maximum Entropy Method

Information Theory (10.6 ~ 10.10, 10.13 ~ 10.15) CS679 Lecture Note by Sungho Ryu

Information Theory (10.6 ~ 10.10, 10.13 ~ 10.15) CS679 Lecture Note by Sungho Ryu

Presentation Transcript

Physics 207, Lecture 17, Oct. 31

Information Theory For Data Management

Translating Theory to Practice: A Cognitive Information Processing (CIP) Approach to Career Development and Services

Computing Fundamentals 2 Lecture 2 A theory of Graphs

Digital Modulation Technique

Lecture 11. Quantum Mechanics. Hartree-Fock Self-Consistent-Field Theory

CONTEXT FOR ORGANIZATION THEORY

Applied Game Theory Lecture 4

Valence Bond Theory vs. MO Theory

Quantum Information Theory as a Proof Technique

Applied Game Theory Lecture 3

Solvers: Equality and Arrays Lecture 4 2012

Theory of Compilation 236360 Erez Petrank

CS 208: Computing Theory

Information Bottleneck

LISP (Lecture Note #7)

Lecture 5

Computing Fundamentals 2 Lecture 2 A theory of Graphs

Information Theory, Statistical Measures and Bioinformatics approaches to gene expression

Advanced Operating Systems Lecture notes gost.isi/555

CS221: Algorithms and Data Structures Lecture #1 Complexity Theory and Asymptotic Analysis

Lecture 1 Theory of Automata