SVD & LSI

SVD & LSI ML Reading Group Jan-24-2006 Presenter: Zheng Zhao

SVD (Singular value decomposition) • Vector Norm • Matrix Norm • Singular value decomposition • The application of SVD

vector norm • A vector norm has the following properties. • 1. || x ||  0 (non-negative) • 2. || x || = 0 implies that all elements xi = 0 • 3. || x || =  || x || • 4. || x1 + x2 ||  || x1 || + || x2 || (triangular inequality) • Equivalence of norms

vector norm (cont.)

matrix (operator) norm A matrix (operator) norm has the following properties. 1. || A ||  0 (non-negative) 2. || A || = 0 implies that all elements xi = 0 3. || A || =  || A || 4. || A1 + A2 ||  || A1 || + || A2 || (triangular inequality) 5. || AB ||  || A || || B || (multiplicative property) An induced norm is defined as the following, for z = Ax measures how much A stretches x

matrix (operator) norm (cont.)

SVD • SVD- Singular value decomposition http://en.wikipedia.org/wiki/Singular_value_decomposition

Some Properties of SVD

Some Properties of SVD • That is, Ak is the optimal approximation in terms of the approximation error measured by the Frobenius norm, among all matrices of rank k • Forms the basics of LSI (Latent Semantic Indexing) in informational retrieval

Application of SVD • Pseudoinverse • Range, null space and rank • Matrix approximation • Other examples http://en.wikipedia.org/wiki/Singular_value_decomposition

LSI (Latent Semantic Indexing) • Problem Introduction • Latent Semantic Indexing • LSI • Query • Updating • An example • Some comments

Problem Introduction • Traditional term-matching method doesn’t work well in information retrieval • We want to capture the concepts instead of words. Concepts are reflected in the words. However, • One term may have multiple meaning • Different terms may have the same meaning.

LSI (Latent Semantic Indexing) • LSI approach tries to overcome the deficiencies of term-matching retrieval by treating the unreliability of observed term-document association data as a statistical problem. • The goal is to find effective models to represent the relationship between terms and documents. Hence a set of terms, which is by itself incomplete and unreliable, will be replaced by some set of entities which are more reliable indicants.

LSI, the Method • Document-Term M • Decompose M by SVD. • Approximating M using truncated SVD

LSI, the Method (cont.) Each row and column of A gets mapped into the k-dimensional LSI space, by the SVD.

Fundamental Comparison Quantities from the SVD Model • Comparing Two Terms: the dot product between two row vectors of reflects the extent to which two terms have a similar pattern of occurrence across the set of document. • Comparing Two Documents: dot product between two column vectors of • Comparing a Term and a Document

Query • A query q is also mapped into this space, by • Compare the similarity in the new space • Intuition: Dimension reduction through LSI brings together “related” axes in the vector space.

Updating • Recomposing • Expensive • Fold in Method New terms and documents have no effect on the representation of the preexisting terms and documents

Example

Example (cont.)

Example (cont. Mapping)

Example (cont. Query) Query: Application and Theory

Example (cont. Query)

Example (cont. fold in)

Example (cont. recomposing)

Choosing a value for k • LSI is useful only if k << n. • If k is too large, it doesn't capture the underlying latent semantic space; if k is too small, too much is lost. • No principled way of determining the best k; need to experiment.

How well does LSI work? • Effectiveness of LSI compared to regular term-matching depends on nature of documents. • Typical improvement: 0 to 30% better precision. • Advantage greater for texts in which synonymy and ambiguity are more prevalent. • Best when recall is high. • Costs of LSI might outweigh improvement. • SVD is computationally expensive; limited use for really large document collections • Inverted index not possible

SVD &amp; LSI

SVD &amp; LSI

Presentation Transcript

SVD & LSI

SVD & LSI