870 likes | 1.06k Vues
Biometrics and High Dimensional Data. V áclav Snášel VŠB- Technical University of Ostrava Czech Republic. Motto. Outline. Curse of dimensionality, dimensionality reduction Singular Value Decomposition (SVD), Definition, Interpreting an SVD, Factor interpretation,
E N D
Biometrics and High Dimensional Data Václav Snášel VŠB-Technical University of Ostrava CzechRepublic
Motto Krakow 2014
Outline • Curse of dimensionality, dimensionality reduction • Singular Value Decomposition (SVD), Definition, Interpreting an SVD, Factor interpretation, • Geometric interpretation, Component interpretation, Algorithm issues, Algorithms and complexity Krakow 2014
Outline • Semi Discrete Decomposition (SDD), Interpreting an SDD, Factor interpretation, • Geometric interpretation, Component interpretation, Graph interpretation, Algorithm issues, Algorithms and complexity, • Applications of SVD and SDD together • Geometric interpretation, Component interpretation, Graph interpretation, Algorithm issues, Algorithms and complexity, Krakow 2014
Outline • Tensor Decomposition, Basic Tensor Concepts, Tensor SVD, Approximating a Tensor by HOSVD • Data Mining Applications, • Classification of Handwritten Digits, Handwritten Digits and a Simple Algorithm • Text Mining, Ontology • Face Recognition Using Tensor SVD Krakow 2014
Modern data Facts Computers make it easy to collect and store data. Costs of storage are very low and are dropping very fast. (most laptops have a storage capacity of more than 500GB …) When it comes to storing data The current policy typically is “store everything in case it is needed later” instead of deciding what could be deleted. Data Mining Extract useful information from the massive amount of available data. Krakow 2014
Why linear (or multilinear) algebra? Data are represented very often by matrices Numerous modern datasets are in matrix form. Data are represented by tensors. Data in the form of tensors (multi-mode arrays) are becoming very common in the data mining and information retrieval literature in the last few years. Krakow 2014
Why matrix decompositions? • Matrix decompositions – spectral analysis • (e.g., SVD, SDD, CX and CUR, NMF, MMMF, etc.) • They use the relationships between the available data in order to identify components of the underlying physical system generating the data. • Some assumptions on the relationships between the underlying components are necessary. • Very active area of research; some matrix decompositions are more than one century old, whereas others are very recent. Krakow 2014
Spectral analysis - simple analog illustration • Hidden Components in Light – Separated by a Prism • Our purpose – finding hidden components by data analysis Krakow 2014
Images matrices A collection of images is represented by an m-by-n matrix npictures • Data mining tasks • Cluster or classify images • Find “nearest neighbors” • Feature selection • find a subset of features that (accurately) clusters or classifies images. mpixels (points) (features) Aij = color valuesof i-thpixelin j-thimage Krakow 2014
Data representation • For example, take the following typical cases: A face recognition/classification system based on m x n greyscale images which, by row concatenation, can be transformed into mn-dimensional real vectors. In practice, one could have images of m = n = 256 or 65536 dimensional vectors. p1 p65536 = (p1,p2 ,p3, …,p65536) Krakow 2014
Retrieval model • Similarity between two points (documents or a document and a query) is usually calculated as normalized scalar product of their vectors (cosine measure). Krakow 2014
Document-term matrices A collection of documents is represented by an m-by-n matrix ndocuments • Data mining tasks • Cluster or classify documents • Find “nearest neighbors” • Feature selection • find a subset of terms that (accurately) clusters or clasifies documents. Aij = frequency of i-thterm in j-thdocument mterms (words) Krakow 2014
Market basket matrices n products (e.g., milk, bread, wine, etc.) Common representation for association rule mining. • Data mining tasks • Find association rules • E.g., customers who buy product x buy product y with probability 89%. • Such rules are used to make item display decisions, advertising decisions, etc. Aij = quantity of j-th product purchased by the i-th customer m customers Krakow 2014
Social networks (e-mail graph, FaceBook, MySpace, etc.) n users Represents the email communications (relationships) between groups of users. • Data mining tasks • cluster the users • identify “dense” networks of users (dense subgraphs) Aij = number of emails exchanged between users i and j during a certain time period m users Krakow 2014
Recommendation systems The m-by-n matrix A representsm customers andn products. products Aij = utility of j-th product to i-th customer Data mining task Given a few samples from A, recommend high utility products to customers. customers Krakow 2014
Intrusion detection The m-by-n matrix A representsmrecords andnattributes. The data for our experiments was prepared by the 1998 DARPA intrusion detection evaluation program by MIT Lincoln Labs attributes Aij = utility of j-thattributeto i-threcord records Data mining task Reduce noise in the data. Krakow 2014
m customers n products n products Tensors: recommendation systems • Economics: • Utilityis ordinal and not cardinal concept. • Compare products; don’t assign utility values. • Recommendation Model Revisited: • Every customer has an n-by-n matrix (whose entries are +1,-1) and represent pair-wise productcomparisons. • There are m such matrices, forming an n-by-n-by-m 3-mode tensor A. Krakow 2014
Curse of Dimensionality Krakow 2014
N - Dimensions • Sommerville, D. M. Y. 1929. An Introductionto the Geometry of N Dimensions. New York:Dover Publications. The graph of n-ball volume as a function of dimension was plotted more than 100 years agoby Paul RennoHeyl, who was then a graduate student at the University of Pennsylvania. Thevolume graph is the lower curve, labeled “content.” The upper curve gives the ball’s surfacearea, for which Heyl used the term “boundary.” The illustration is from Heyl’s 1897 thesis,“Properties of the locus r = constant in space of n dimensions.” Krakow 2014
N - Dimensions The graph of n-ball volume as a function of dimension was plotted more than 100 years agoby Paul RennoHeyl, who was then a graduate student at the University of Pennsylvania. The upper curve gives the ball’s surfacearea, for which Heyl used the term “boundary.” Thevolume graph is the lower curve, labeled “content.” The illustration is from Heyl’s 1897 thesis,“Properties of the locus r = constant in space of n dimensions.” Brian Hayes: An Adventure in the Nth Dimension. American Scientist, Vol. 99, No. 6, November-December 2011, pages 442-446. Krakow 2014
N - Dimensions Beyond the fifth dimension, the volume of unit n-ball decreases as n increases. When we compute few larger values of n, finding that V(20,1) = 0,0258 and V(100,1) = 10-40 Brian Hayes: An Adventure in the Nth Dimension. American Scientist, Vol. 99, No. 6, November-December 2011, pages 442-446. Krakow 2014
Curse of Dimensionality • The curse of dimensionality is a term coined by Richard Bellman to describe the problem caused by the exponential increase in volume associated with adding extra dimensions to a space. • Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Krakow 2014
Curse of Dimensionality • When dimensionality increases, data becomes increasingly sparse in the space that it occupies • Definitions of density and distance between points, which is critical for data mining, become less meaningful Randomly generate 500 points Compute difference between max and min distance between any pair of points any pair of points Krakow 2014
Curse of Dimensionality The volume of an n-dimensional sphere with radius r is Ratio of the volumes of unit sphere and embedding hypercube of side length 2 up to the dimension 14. dimension Krakow 2014
Curse of Dimensionality The volume of an n-dimensional sphere with radius r is circular ring with radius 1 Ratio of volume of n-dimensional sphere with radius 20 volume of circular ring with radius 1 Krakow 2014
Curse of Dimensionality 2-dimension case circular ring with radius 1 Krakow 2014
Curse of Dimensionality 20-dimension case circular ring with radius 1 Krakow 2014
Curse of Dimensionality • The model space is EMPTY! (in huge dimension all volume is in surface) • Distribution of data is uniform! (in huge dimension all distance is being uniform) Krakow 2014
N-dimensional cube For convenience, let α(n, i) denote the maximum area of an i dimensional cross section of In. As for the shapes of the cross sections of In, our knowledge is very limited. Krakow 2014
Motivation • An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate. • The nature of this data is significantly different. Krakow 2014
Introduction • Since the volume (and dimensionality) of data is typically large, the emphasis of new algorithms must be on efficiency and scalability to large data sets. • Analysis of continuous attribute data generally takes the form of eigenvalue/singular value problems (PCA/rank reduction), clustering, least squares problems, etc. Krakow 2014
Introduction • The design of approximation algorithms for hard optimization problems can be viewed as two-step process: (i) Find a Relaxation of the given problem, i.e., a problem whose feasible set encloses that of the original problem and can be solved in polynomial time, and (ii) Find a Rounding, i.e., a mapping from a solution of the relaxation to a solution of the original problem. Krakow 2014
Dimensionality Reduction Krakow 2014
Matrix Decomposition • Lars Elden, Matrix Methods in Data Mining and Pattern Recognition, SIAM 2007 • David Skillicorn, Understanding Complex Datasets: Data Mining with Matrix Decomposition, Chapman & Hall/CRC, 2007 Krakow 2014
Dimensionality Reduction • Purpose: • Avoid curse of dimensionality • Reduce amount of time and memory required by data mining algorithms • Allow data to be more easily visualized • May help to eliminate irrelevant features or reduce noise • Techniques • Principle Component Analysis • Singular Value Decomposition • Others: supervised and non-linear techniques Krakow 2014
Dimensionality Reduction = optimization problem • For any set of n points X in Rd dimensionality reduction is a map from f: Rd->Rk . (where d >> k) Krakow 2014
mm mn V is nn Singular value decomposition For an m n matrix A (Document term) of rank r there exists a factorization (Singular Value Decomposition = SVD) as follows: The columns of U are orthogonal eigenvectors of AAT. The columns of V are orthogonal eigenvectors of ATA. Eigenvalues 1 … rof AATare the eigenvalues of ATA. Krakow 2014
Dokuments = * * Terms Ak (n x m) Uk (n x k) (k x k) VkT (k x m) Singular value decomposition s1 s2 sk Krakow 2014
Latent semantic indexing 1/1 • LSI – k-reducedsingulardecompositionof the term-by-document matrix • Latent semantics – hidden connections between both terms and documents determined on documents’ content • DocumentmatrixDk= SkVkT(orDk’ =VkT) • Term matrixTk= UkSk(orTk’ = Uk) • Queryin reduced qk= UkTq (or qk’ = Sk-1 UkTq) dimension Krakow 2014
Latent semantic indexing 1/2 In another words. Documentsare represented as linear combination of meta terms. d1 = Σ w1imi d2 = Σ w2imi ……………. dn = Σ wnimi Krakow 2014
Retrieval in LSI • Similarity between two documents or a document and a query is usually calculated as normalized scalar product of their vectors of meta term. Krakow 2014
Picture matrix pic Krakow 2014
Retrieval in LSI Fig08 0.9769 Query Fig09 0.9165 Krakow 2014
Retrieval in LSI Fig02 0.9740 Query Fig06 0.3011 Krakow 2014
Retrieval in LSI Fig14 0.3640 Fig11 0.3482 Query Krakow 2014
Latent semantic indexing What is meta term? Meta term is linear combination of terms. What does the meta term mean? There is some interpretation of meta terms? Krakow 2014
Building collection Krakow 2014
Meta point Krakow 2014
DCT Krakow 2014