1 / 24

Information Geometry: Duality, Convexity, and Divergences

Information Geometry: Duality, Convexity, and Divergences. Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 junz@umich.edu *Currently on leave to AFOSR under IPA. Clarify two senses of duality in information geometry:. Reference duality:

phineas
Télécharger la présentation

Information Geometry: Duality, Convexity, and Divergences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Geometry: Duality, Convexity, and Divergences Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 junz@umich.edu *Currently on leave to AFOSR under IPA

  2. Clarify two senses of duality in information geometry: Reference duality: choice of the reference vs comparison point on the manifold; Representational duality: choice of a monotonic scaling of density function; Lecture Plan • A revisit to Bregman divergence • Generalization (a-divergence on Rn) and a-Hessian geometry 3) Embedding into infinite-dimensional function space 4) Generalized Fish metric and a-connection on Banach space

  3. Bregman Divergence i) Quadri-lateral relation: Triangular relation (generalized cosine) as a special case: ii) Reference-representation biduality:

  4. Canonical Divergence and Fenchel Inequality An alternative expression of Bregman divergence is canonical divergence or explicitly: That A is non-negative is a direct consequence of the Fenchel inequality for a strictly convex function: where equality holds if and only if

  5. Convex Inequality and a-Divergence Induced by it By the definition of a strictly convex function F, It is easy to show that the following is non-negative for all , Conjugate-symmetry: Easily verifiable:

  6. Proposition: For a smooth function F:Rn -> R, the following are equivalent: Significance of Bregman Divergence Among a-Divergence Family

  7. Statistical Manifold Structure Induced From Divergence Function (Eguchi, 1983) Given a divergence D(x,y), with D(x,x)=0. One can then derive the Riemannian metric and a pair of conjugate connections: Expanding D(x,y) around x=y: In essence, is satisfied by such identification of derivatives of D. i) 2nd order: one (and the same) metric ii) 3rd order: a pair of conjugated connections

  8. i) The metric and conjugate affine connections are given by: ii) Riemann curvature is given by: a-Hessian Geometry (of Finite-Dimension Vector Space) Theorem. D(a) induces the a-Hessian manifold, i.e.

  9. iii) The manifold is equi-affine, with the Tchebychev potential given by: and a-parallel volume form given by iv) There exists biorthogonal coordinates: with

  10. A General Divergence Function(al) From Vector Space to Function Space Question: How to extend the above analysis to infinite-dimensional function space? for any two functions in some function space, and an arbitrary, strictly increasing function . Remark: Induced by convex inequality

  11. A Special Case of D(a): Classic a-Divergence For parameterized pdf’s, such divergence induces an a-independent metric, but a-dependent dual connections:

  12. Other Examples ofD(a) Jensen Difference U-Divergence (a=1)

  13. A Short Detour: Monotone Scaling Define monotone embedding (“scaling”) of a measurable function p as the transformation r(p), where is a strictly monotone function. Therefore, monotone embeddings of a given probability density function form a group, with functional composition as group operation: Observe: i) r is strictly monotone iff r-1 is strictly monotone; ii) r(t) = t as the identity element; We recall that for a strictly convex function f : iii) r1, r2 are strictly monotone, so is

  14. DEFINITION: r-embedding is said to be conjugated to t-embedding with respect to a strictly convex function f (whose conjugate is f*) if : Example: a-embedding

  15. A sub-manifold is said to be r-affine if there exists a countable set of linearly independent functions li(z) over a measurable space such that: Here, q is called the “natural parameter”. The “expectation parameter” is defined by projecting the conjugated t-embedding onto the li(z): Example: For log-linear model (exponential family) The expectation parameter is: Parameterized Functions as Forming a Submanifold under Monotone Scaling

  16. i) The following potential function is strictly convex: F(q) is called the generating (partition) functional. ii) Define, under the conjugate representations then is Fenchel conjugate of . F*(h) is called the generalized entropy functional. Proposition. For the r-affine submanifold: Theorem. The r-affine submanifoldis a-Hessian manifold.

  17. An Application: the (a,b)-Divergence Take f=r-(b), where: called “alpha-embedding”, now denoted by b. a: parameter reflecting reference duality b: parameter reflecting representation duality They reduce to a-divergence proper A(a) and to Jensen difference E(a):

  18. Proposition 1. Denote tangent vector fields which are, at given p on the manifold, themselves functions in Banach space. The metric and dual connections induced by take the forms: Written in dually symmetric form: Information Geometry on Banach Space

  19. Corollary 1a. For a finite-dimensional submanifold (parametric model), with The metric and dual connections associated with are given by: with Remark: Choosing reduces to the forms of Fisher metric and the a-connections in classical parametric information geometry, where

  20. Remark: The ambient space B is flat, so it embeds, as proper submanifolds, • the manifold Mmof probability density functions (constrained to be • positive-valued and normalized to unit measure); • the finite-dimensional manifold Mqof parameterized probability models. Mq Mm B(ambient manifold) Proposition 2. The curvature R(a) and torsion tensors T(a)associated with any a-connection on the infinite-dimensional function space Bare identically zero. CAVEAT: Topology? (G. Pistone and his colleagues)

  21. Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper: Remark: The (a,b)-divergence is the homogeneous f-divergence As such, it should reproduce the standard Fisher metric and the dual alpha- connections in their proper form. Again, it is the ab that takes the role of the conventional “alpha” parameter.

  22. Summary of Current Approach Divergence a-divergence equiv to d-divergence (Zhu & Rohwer, 1985) includes KL divergence as a special case f-divergence (Csiszar) Bregman divergence equivalent to the canonical divergence U-divergence (Eguchi) Geometry Riemannian metric Fisher information Conjugate connections a-connection family Equi-affine structure cubic form, Tchebychev 1-form Curvature Convex-based a-divergence for vector space of finite dim function space of infinite dim Generalized expressions of Fisher metric a-connections

  23. References Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195. Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ. Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65. Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67). Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170. Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics. Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.

  24. Questions?

More Related