1 / 114

Information Bottleneck

Information Bottleneck. presented by Boris Epshtein & Lena Gorelick. Advanced Topics in Computer and Human Vision Spring 2004. Agenda. Motivation Information Theory - Basic Definitions Rate Distortion Theory Blahut-Arimoto algorithm Information Bottleneck Principle IB algorithms iIB

radha
Télécharger la présentation

Information Bottleneck

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Bottleneck presented by Boris Epshtein & Lena Gorelick Advanced Topics in Computer and Human Vision Spring 2004

  2. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  3. Motivation Clustering Problem

  4. Motivation • “Hard” Clustering – partitioning of the input data into several exhaustive and mutually exclusive clusters • Each cluster is represented by a centroid

  5. Motivation • “Good” clustering – should group similar data points together and dissimilar points apart • Quality of partition – average distortion between the data points and corresponding representatives (cluster centroids)

  6. Motivation • “Soft” Clustering – each data point is assigned to all clusters with some normalized probability • Goal – minimize expected distortion between the data points and cluster centroids

  7. Motivation… Complexity-Precision Trade-off • Too simple model Poor precision • Higher precision requires more complex model

  8. Motivation… Complexity-Precision Trade-off • Too simple model Poor precision • Higher precision requires more complex model • Too complex model Overfitting

  9. Motivation… Complexity-Precision Trade-off • Too Complex Model • can lead to overfitting • is hard to learn • Too Simple Model • can not capture the real structure of the data • Examples of approaches: • SRM Structural Risk Minimization • MDL Minimum Description Length • Rate Distortion Theory Poor generalization

  10. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  11. Definitions… Entropy • The measure of uncertainty about the random variable

  12. Definitions… Entropy - Example • Fair Coin: • Unfair Coin:

  13. Definitions… Entropy - Illustration Highest Lowest

  14. Definitions… Conditional Entropy • The measure of uncertainty about the random variable given the value of the variable

  15. Definitions… Conditional EntropyExample

  16. Definitions… Mutual Information • The reduction in uncertainty of due to the knowledge of • Nonnegative • Symmetric • Convex w.r.t. for a fixed

  17. Definitions… Mutual Information - Example

  18. Definitions… Kullback Leibler Distance Over the same alphabet • A distance between distributions • Nonnegative • Asymmetric

  19. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  20. Rate Distortion TheoryIntroduction • Goal: obtain compact clustering of the data with minimal expected distortion • Distortion measure is a part of the problem setup • The clustering and its quality depend on the choice of the distortion measure

  21. Rate Distortion Theory Data • Obtain compact clustering of the data with minimal expected distortion given fixed set of representatives ? Cover & Thomas

  22. Rate Distortion Theory - Intuition • zero distortion • not compact • high distortion • very compact

  23. Rate Distortion Theory – Cont. • The quality of clustering is determined by • Complexity is measured by • Distortion is measured by (a.k.a. Rate)

  24. Rate Distortion Plane D - distortion constraint Minimal Distortion Maximal Compression Ed(X,T)

  25. Rate Distortion Function • Let be an upper bound constraint on the expected distortion Higher values of mean more relaxed distortion constraint Stronger compression levels are attainable • Given the distortion constraint find the most compact model (with smallest complexity )

  26. Rate Distortion Function • Given • Set of points with prior • Set of representatives • Distortion measure • Find • The most compact soft clustering of points of that satisfies the distortion constraint • Rate Distortion Function

  27. Rate Distortion Function Complexity Term Distortion Term Lagrange Multiplier Minimize !

  28. Rate Distortion Curve Minimal Distortion Maximal Compression Ed(X,T)

  29. Rate Distortion Function Minimize Subject to The minimum is attained when Normalization

  30. Solution - Analysis Solution: Known The solution is implicit

  31. Solution - Analysis Solution: For a fixed When is similar to is small closer points are attached to with higher probability

  32. Solution - Analysis Solution: Fix t reduces the influence of distortion does not depend on this + maximal compression single cluster Fix x most of cond. prob. goes to some with smallest distortion hard clustering

  33. Solution - Analysis Solution: Intermediate soft clustering, intermediate complexity Varying

  34. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  35. Blahut – Arimoto Algorithm Input: Randomly init Optimize convex function over convex set the minimum is global

  36. Blahut-Arimoto Algorithm Advantages: • Obtains compact clustering of the data with minimal expected distortion • Optimal clustering given fixed set of representatives

  37. Blahut-Arimoto Algorithm Drawbacks: • Distortion measure is a part of the problem setup • Hard to obtain for some problems • Equivalent to determining relevant features • Fixed set of representatives • Slow convergence

  38. Rate Distortion Theory – Additional Insights • Another problem would be to find optimal representatives given the clustering. • Joint optimization of clustering and representatives doesn’t have a unique solution. (like EM or K-means)

  39. Agenda • Motivation • Information Theory - Basic Definitions • Rate Distortion Theory • Blahut-Arimoto algorithm • Information Bottleneck Principle • IB algorithms • iIB • dIB • aIB • Application

  40. Information Bottleneck • Copes with the drawbacks of Rate Distortion approach • Compress the data while preserving “important” (relevant) information • It is often easier to define what information is important than to define a distortion measure. • Replace the distortion upper bound constraint by a lower bound constraint over the relevant information Tishby, Pereira & Bialek, 1999

  41. Information Bottleneck-Example Given: Documents Joint prior Topics

  42. Information Bottleneck-Example Obtain: I(Word;Topic) I(Cluster;Topic) I(Word;Cluster) Words Partitioning Topics

  43. Information Bottleneck-Example Extreme case 1: I(Cluster;Topic)=0 Not Informative I(Word;Cluster)=0 Very Compact

  44. Information Bottleneck-Example Extreme case 2: I(Cluster;Topic)=max VeryInformative I(Word;Cluster)=max Not Compact Minimize I(Word; Cluster) & maximize I(Cluster; Topic)

  45. Information Bottleneck topics words Compactness Relevant Information

  46. Relevance Compression Curve Maximal Relevant Information D – relevance constraint Maximal Compression

  47. Relevance Compression Function • Let be minimal allowed value of Smaller more relaxed relevant information constraint Stronger compression levels are attainable • Given relevant information constraint Find the most compact model (with smallest )

  48. Relevance Compression Function Compression Term Relevance Term Lagrange Multiplier Minimize !

  49. Relevance Compression Curve Maximal Relevant Information Maximal Compression

  50. Relevance Compression Function Minimize Subject to The minimum is attained when Normalization

More Related