1 / 41

Generalization in Learning from examples

Generalization in Learning from examples. Function Space . Hilbert Space.

artan
Télécharger la présentation

Generalization in Learning from examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalization in Learning from examples

  2. Function Space

  3. Hilbert Space • A Hilbert spaceH is a real or complexinner product space that is also a complete metric space with respect to the distance function induced by the inner product. To say that H is a complex inner product space means that H is a complex vector space on which there is an inner product ⟨x,y⟩ associating a complex number to each pair of elements x,y of H, that satisfies the properties:

  4. Hilbert Space properties • ⟨y,x⟩ is the complex conjugate of ⟨x,y⟩: • ⟨x,y⟩ is linear in its first argument. For all complex numbers a and b, • The inner product is positive definite: where the case of equality holds precisely when x = 0.

  5. Hilbert Space properties • The norm defined by the inner product ⟨•,•⟩ is the real-valued function • the distance between two points x,y in H is defined in terms of the norm by • That this function is a distance function means • it is symmetric in x and y, • the distance between x and itself is zero, and otherwise the distance between x and y must be positive, • the triangle inequality holds, meaning that the length of one leg of a triangle xyz cannot exceed the sum of the lengths of the other two legs: • This last property is ultimately a consequence of the more fundamental Cauchy–Schwarz inequality, which asserts with equality if and only if x and y are parallel.

  6. Examples of Hilbert space

  7. Overfittingvs Generalization

  8. 2. Generalization • A problem is well-posed if its solution: • exists • is unique • depends continuously on the data (e.g. it is stable) • A problem is ill-posed if it is not well-posed. In the context of • this class, well-posedness is mainly used to mean stability of the solution.

  9. generalization • eidetic generalization: • the process of imagination of possible cases rather than observation of actual ones. • eidos: properties, kinds or types of ideal species that entities may exemplify • eidetic variation: • possible changes an individual can undergo while remaining an instance of a given type of an essence

  10. Stabilizer • Popper’s claim that • empirical data are not sufficient for obtaining any pattern. • in addition to empirical data, one needs some conceptual data expressing prior knowledge about properties of a desired function. • In 1990s, Poggio and Girosiproposed modifying the empirical error functional: • Ψ is a functional expressing some global property (such as smoothness) of the function to be minimized

  11. Unstable example

  12. stable example

  13. 3. Inverse Problems • For such operator A :X →Y between two Hilbert spaces , , an inverse problem determined by A is a task of finding for g ∈ Y (called data) some f ∈ X (called solution) such that • A(f) = g. • When X and Y • finite dimensional: linear operators can be represented by matrices. • infinite dimensional: typical operators are integral ones. • Fredholm integral equations of the first and second kind:

  14. well-posed and ill-posed problem • Hadamard introduced the definition of ill-posedness. Ill-posed problems are typically inverse problems. • As an example, assume g is a function in Y and u is a function in X, with Y and X Hilbert spaces. Then given the linear, continuous operator L, consider the equation g = Lu. • The direct problem is to compute g given u; the inverse problem is to compute u given the data g. In the learning case L is somewhat similar to a “sampling” operation and the inverse problem becomes the problem of finding a function that takes the values f (xi ) = yi , i = 1, ...n • The inverse problem of finding u is well-posed when the solution • exists, • is unique and • is stable, that is depends continuously on the initial data g.

  15. 4. Pseudosolutionsof Inverse Problems • When there is no solution • For an operator A : X → Y, let • R(A) = {g ∈ Y|(∃f ∈ X)(A(f) = g)} • denotes its range and πclR(A) :Y →clR(A) the projection of Y onto the closure of R(A) in • every continuous operator A between two Hilbert spaces has an adjoint A∗ satisfying for all f ∈ X and all g ∈ Y,

  16. Pseudosolutions of Inverse Problems • If the range of A is closed, then there exists a unique continuous linear pseudoinverseoperator A+ :Y →X such that for every g ∈ Y: • for every g ∈ Y • AA+(g) = πclR(g) • A+= (A*A)+A*=A*(AA)+

  17. Pseudosolutions of Inverse Problems • To solve more general least-squares problems, define Moore–Penrose pseudoinverses for all continuous linear operator A : X → Y between two Hilbert spaces X and Y • every continuous linear operator have a continuous linear pseudo-inverse. • Just ones whose range is closed in Y. • If the range is not closed, then A+ is only defined for those g ∈ Y, for which πclR(A)(g) ∈ R(A).

  18. condition number • Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix: • A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of A can lead to huge errors in the entries of the solution.

  19. 5.regularization • A method of improving stability of solutions of ill-conditioned inverse problems, called regularization. • The basic idea in the treatment of ill-conditioned problems • use some a priori knowledge about solutions to disqualify meaningless ones. • such knowledge can be: • some regularity condition on the solution expressed existence of derivatives up to a certain order with bounds on the magnitudes of these derivatives • some localization conditionsuch as a bound on the support of the solution or its behavior at infinity. • Tikhonov’s regularization: penalizes undesired solutions by adding a term called a stabilizer.

  20. stabilizer • Ψ is a functional called stabilizer. • The regularization parameter γ plays the role of a trade-off between the least square solution and the penalization expressed by Ψ. • Typical choice of a stabilizer is the square of the norm on X, for which the original problem is replaced with a minimization of the functional :

  21. Regularization • For this stabilizer, regularized solutions always exist. • pseudosolutions, which in the infinite dimensional case do not exist for those data g, for which πclR(A)(g) R(A) . • For every continuous operator A : X → Y between two Hilbert spaces and for every  > 0, there exists a unique operator:

  22. Regularization • Even when the original inverse problem does not have a unique solution, for every γ > 0 the regularized problem has a unique solution. • due to the uniform convexity of the functional. • With γ decreasing to zero, the solutions Aγ(g) of the regularized problems converge to the normal pseudosolution A+(g).

  23. Examples of stabilizers • Localization: • Smoothness:

  24. 6. Learning from data as an inverse problem • Learning of neural networks from examples is also an inverse problem • For a given training set find an unknown input-output function • operator performs the evaluations of an input-output function at the input data from the training set

  25. Learning from data as an inverse problem • Empirical error function representation with • So minimization of empirical error function is an inverse problem Where v is output data vector • Finding a pseudosolution of this inverse problem is equivalent to the minimization of the empirical error functional Ez over X.

  26. Conditions of inverse problem • to take advantage of characterizations of pseudosolutions and regularized solutions from theory of inverse problems, solutions of the inverse problem defined by the operator Lu should be searched for in suitable Hilbert spaces, on which • all evaluation operators of the form (5) are continuous • norms can express some undesired properties of input-output functions

  27. 7. Reproducing Kernel Hilbert Spaces (RKHS) • reproducing kernel Hilbert space (RKHS) as a Hilbert space of pointwise defined real-valued functions on a nonempty set Ω such that all evaluation functionals are continuous, i.e., for every x ∈ Ω, the evaluation functional Fx, defined for any f ∈ X as: is continues (bounded).

  28. Properties of RKHS • every RKHS is uniquely determined by a symmetric positive semidefinite kernel K : Ω × Ω → R, i.e., a symmetric function of two variables satisfying for all m, all (w1, . . . , wm) ∈ Rm, and all (x1, . . . , xm) ∈ Ωm, K is Symmetric K is PD

  29. Reproducing Kernel

  30. RKHS and kernels

  31. Examples of pd kernels

  32. Using inverse problem • As on every RKHS HK(Ω), all evaluation functionals are continuous, for every sample of input data u = (u1, . . . , um), the operator is continuous. Moreover, its range is closed because it is finite dimensional. So one can apply results from theory of inverse problems

  33. pseudosolution and RKHS • and K[u] is the Gram matrix of the kernel K with respect to the vector u defined as • f+ minimizes the empirical error. • f+ can be interpreted as an input-output function of a neural network with one hidden layer of kernel units and a single linear output unit.

  34. Regularization and RKHS

  35. RKHS and inverse problem • f+ and fγ minimizing and , resp., are linear combinations of representers Ku1, . . . , Kum of input data u1, . . . , um, but the coefficients of the two linear combinations are different.

  36. ill-posedness of K[u] • K is positive definite • the row vectors of the matrix K[u] are linearly independent. • But when the distances between the data u1, . . . , um are small, the row vectors might be nearly parallel and the small eigenvalues of K[u] might cluster near zero. • small changes of v can cause large changes of f+.

  37. types of ill-posedness of K[u] • the matrix can be rank-deficient • it has a cluster of small eigenvalues and a gap between large and small eigenvalues • the matrix can represent a discrete ill-posed problem • when its eigenvalues gradually decay to zero without any gap in its spectrum

  38. 8. Three reasons for using kernels in ML • Linear separation simplifies classification. In some cases, even data which are not linearly separable can be transformed into linearly separable ones.

  39. reasons for using kernels in ML • Stabilizers of the form are special cases of squares of norms on RKHS generated by convolution kernels • For kernels the value of stabilizer at any is expressed as • Gaussian kernel is an example of convolution kernel with positive FT.

  40. reasons for using kernels in ML • Reformulation of minimization of the empirical error functional as an inverse problem In RKHS, all evaluation functions are continuous, which is necessary for application tools.

More Related