Stochastic Approach for Link Structure Analysis (SALSA)

# Stochastic Approach for Link Structure Analysis (SALSA)

Télécharger la présentation

## Stochastic Approach for Link Structure Analysis (SALSA)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

2. SALSA • Created by Lempel Moran in 2000 • Combination of HITS and PageRank

3. SALSA’s similarities to HITS and PageRank • SALSA uses authority and hub score • SALSA creates a neighborhood graph using authority and hub pages and links

4. SALSA’s differences between HITS and PageRank • The SALSA method create a bipartite graph of the authority and hub pages in the neighborhood graph. • One set contains hub pages • One set contains authority pages • Each page may be located in both sets

5. Neighborhood Graph G

6. Bipartite Graph G of Neighborhood Graph N

7. Markov Chains • Two matrices formed from bipartite graph G • A hub Markov chain with matrix H • An authority Markov chain with matrix A

8. Where does SALSA fit in? • Matrices H and A can be derived from the adjacency matrix L used in the HITS and PageRank methods • HITS used unweighted matrix L • PageRank uses a row weighted version of matrix L • SALSA uses both row and column weighting

9. How are H and A computed? • Let Lrbe L with each nonzero row divided by its row sum • let Lcbe L with each nonzero column divided by its column sum

10. H, SALSA’s hub matrix, consists of the nonzero rows and columns of LrLcT • A, SALSA’s authority matrix,consists of the nonzero rows and columns of LcTLr

11. Eigenvectors • Av = λv • vTA = λ vT • Numerically: Power Method

12. The Power Method • Xk+1 = AXk • Xk+1T = XkTA • Converges to the dominant eigenvector ( λ = 1).

13. The Power Method • Matrices H and A must be irreducible for the power method to converge to a unique eigenvector given any starting value • If our neighborhood graph G is connected, then both H and A are irreducible • If G is not connected, then performing the power method on H and A will not result in the convergence to a unique dominant eigenvector

14. Our Graph is not connected! • In our example it is clear to see that the graph is not connected as page 2 in the hub set is only connected to page 1 in the authority set and vice versa. • H and A are reducible and therefore contain multiple irreducible connected components

15. Connected Components • H contains two connected components, C = {2} and D = {1, 3, 6, 10} • A contains two connected components, E = {1} and F = {3, 5, 6}

16. Cutting and Pasting. Part I • We can now perform the power method on each component for H and A

17. Cutting and Pasting. Part II • We can now paste the two components together for each matrix • We must multiply each entry in the vector by its appropriate weight

18. H: A:

19. Strengths and Weaknesses • Not affected as much my topic drift like HITS • It gives authority and hub scores. • Handles spamming better than HITS, but not near as good as PageRank • query dependence

20. Thank You For Your Time!