1 / 35

SPARSE TENSORS DECOMPOSITION SOFTWARE

SPARSE TENSORS DECOMPOSITION SOFTWARE. Papa S. Diaw, Master’s Candidate Dr. Michael W. Berry, Major Professor. Introduction. Large data sets Nonnegative Matrix Factorization (NMF) Insights on the hidden relationships Arrange multi-way data into a matrix Computation memory and higher CPU

Télécharger la présentation

SPARSE TENSORS DECOMPOSITION SOFTWARE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPARSE TENSORS DECOMPOSITION SOFTWARE Papa S. Diaw, Master’s Candidate Dr. Michael W. Berry, Major Professor

  2. Introduction • Large data sets • Nonnegative Matrix Factorization (NMF) • Insights on the hidden relationships • Arrange multi-way data into a matrix • Computation memory and higher CPU • Linear relationships in the matrix representation • Failure to capture important structure information • Slower or less accurate calculations

  3. Introduction (cont'd) • Nonnegative Tensor Factorizations (NTF) • Natural way for high dimensionality • Original multi-way structure of the data • Image processing, text mining

  4. Tensor Toolbox For MATLAB • Sandia National Laboratories • Licenses • Proprietary Software

  5. Motivation of the PILOT • Python Software for NTF • Alternative to Tensor Toolbox for MATLAB • Incorporation into FutureLens • Exposure to NTF • Interest in the open source community

  6. Tensors • Multi-way array • Order/Mode/Ways • High-order • Fiber • Slice • Unfolding • Matricization or flattening • Reordering the elements of an N-th order tensor into a matrix. • Not unique

  7. Tensors (cont’d) • Kronecker Product • Khatri-Rao product • A⊙B=[a1⊗b1 a2⊗b2… aJ⊗bJ]

  8. Tensor Factorization • Hitchcock in 1927 and later developed by Cattell in 1944 and Tucker in 1966 • Rewrite a given tensor as a finite sum of lower-rank tensors. • Tucker and PARAFAC • Rank Approximation is a problem

  9. PARAFAC • Parallel Factor Analysis • Canonical Decomposition (CANDE-COMPE) • Harsman,Carroll and Chang, 1970

  10. PARAFAC (cont’d) • Given a three-way tensor X and an approximation rank R, we define the factor matrices as the combination of the vectors from the rank-one components.

  11. PARAFAC (cont’d)

  12. PARAFAC (cont’d) • Alternating Least Square (ALS) • We cycle “over all the factor matrices and performs a least-square update for one factor matrix while holding all the others constant.”[7] • NTF can be considered an extension of the PARAFAC model with the constraint of nonnegativity

  13. Python • Object-oriented, Interpreted • Runs on all systems • Flat learning curve • Supports object methods (everything is an object in Python)

  14. Python (cont’d) • Recent interest in the scientific community • Several scientific computing packages • Numpy • Scipy • Python is extensible

  15. Data Structures • Dictionary • Store the tensor data • Mutable type of container that can store any number of Python objects • Pairs of keys and their corresponding values • Suitable for sparseness of our tensors • VAST 2007 contest data 1,385,205,184 elements, with 1,184,139 nz • Stores the nonzero elements and keeps track of the zeros by using the default value of the dictionary

  16. Data Structures (cont’d) • Numpy Arrays • Fundamental package for scientific computing in Python • Khatri-Rao products or tensors multiplications • Speed

  17. Modules

  18. Modules (cont’d) • SPTENSOR • Most important module • Class (subscripts of nz, values) • Flexibility (Numpy Arrays, Numpy Matrix, Python Lists) • Dictionary • Keeps a few instances variables • Size • Number of dimensions • Frobenius norm (Euclidean Norm)

  19. Modules (cont’d) • PARAFAC • coordinates the NTF • Implementation of ALS • Convergence or the maximum number of iterations • Factor matrices are turned into a Kruskal Tensor

  20. Modules (cont’d)

  21. Modules (cont’d)

  22. Modules (cont’d) • INNERPROD • Inner product between SPTENSOR and KTENSOR • PARAFAC to compute the residual norm • Kronecker product for matrices • TTV • Product sparse tensor with a (column) vector • Returns a tensor • Workhorse of our software package • Most computation • It is called by the MTTKRP and INNERPROD modules

  23. Modules (cont’d) • MTTKRP • Khatri-Rao product off all factor matrices except the one being updated • Matrix multiplication of the matricized tensor with KR product obtained above • Ktensor • Kruskal tensor • Object returned after the factorization is done and the factor matrices are normalized. • Class • Instance variables such as the Norm. • Norm of ktensor plays a big part in determining the residual norm in the PARAFAC module.

  24. Performance • Python Profiler • Run time performance • Tool for detecting bottlenecks • Code optimization • negligible improvement • efficiency loss in some modules

  25. Performance (cnt’d) • Lists and Recursions

  26. Performance (cnt’d) • Numpy Arrays

  27. Performance (cnt’d) • After removing Recursions

  28. Floating-Point Arithmetic • Binary floating-point • “Binary floating-point cannot exactly represent decimal fractions, so if binary floating-point is used it is not possible to guarantee that results will be the same as those using decimal arithmetic.”[12] • Makes the iterations volatile

  29. Convergence Issues

  30. Convergence Issues (ctn’d)

  31. Convergence Issues (cont’d)

  32. Conclusion • There is still work to do after NTF • Preprocessing of data • Post Processing of results such as FutureLens • Expertise • Extract and Identify hidden components • Tucker Implementation. • C extension to increase speed. • GUI

  33. Acknowledgments • Mr. Andrey Puretskiy • Discussions at all stages of the PILOT • Consultancy in text mining • Testing • Tensor Toolbox For MATLAB (Bader and Kolda) • Understanding of tensor Decomposition • PARAFAC

  34. References • http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox/ • Tamara G. Kolda, Brett W. Bader , “Tensor Decompostions and Applications”, SIAM Review , June 10, 2008. • Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, Shun-ichi Amari, “Nonnegative Matrix and Tensor Factorizations”, John Wiley & Sons, Ltd, 1009. • http://docs.python.org/library/profile.html • http://www.mathworks.com/access/helpdesk/help/techdoc • http://www.scipy.org/NumPy_for_Matlab_Users • Brett W. Bader, Andrey A. Puretskiy, Michael W. Berry, “Scenario Discovery Using Nonnegative Tensor Factorization”, J. Ruiz-Schulcloper and W.G. Kropatsch (Eds.): CIARP 2008, LNCS 5197, pp.791-805, 2008 • http://docs.scipy.org/doc/numpy/user/ • http://docs.scipy.org/doc/ • http://docs.scipy.org/doc/numpy/user/whatisnumpy.html • Tamara G. Kolda, “Multilinear operators for higher-order decompositions”, SANDIA REPORT, April 2006 • http://speleotrove.com/decimal/decifaq1.html#inexact

  35. QUESTIONS?

More Related