1 / 12

Chapter 10 Link Analysis

Chapter 10 Link Analysis. Data Mining Techniques So Far…. Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter 8 – Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering Chapter 9 – Market Basket Analysis and Association Rules.

jael
Télécharger la présentation

Chapter 10 Link Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10Link Analysis

  2. Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks • Chapter 8 – Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering • Chapter 9 – Market Basket Analysis and Association Rules

  3. Introduction • Airline Route Maps are useful • Hyperlinks were revolutionary • Apple’s HyperCard (Bill Atkinson) • Claim that there are no more than 6 degrees of separation between any two people on the planet • Link Analysis is the data mining technique that addresses relationships and connections • Link Analysis is based on Graph Theory

  4. Introduction • As you would expect, Link Analysis has its limitations as a DM technique also • However, quite effective in these and similar situations • Identifying authoritative sources of information on the WWW by analyzing page links • Understanding physician referral patterns • Analyzing telephone call patterns

  5. Basic Graph Theory • Graphs are an abstraction used to represent relationships • Graphs consist of • Nodes (vertices) which are the things in the graph that have relationships • Edges are pairs of nodes connected by a relationship • Visualization is a key characteristic of a graph

  6. Basic Graph Theory • A path is an ordered sequence of nodes connected by edges • Flight Segments (legs) such as LA – Denver – Boston • A weighted graph is one in which the edges have weights associated with them • Example: Weights support the association between two products being purchased together

  7. Graph Theory Classic Problems • Finding a path in the graph that visits every edge exactly one time (Seven Bridges – edges are bridges and nodes are land) • Finding the shortest path that visits the nodes in the graph exactly one time (Traveling Salesman) • Completely connected graph with n nodes has n! (n factorial) unique paths that contain all nodes (5! = 5 * 4 * 3 * 2 * 1 = 120)

  8. Directed vs Undirected Graphs • Undirected graphs – edges between nodes go in both directions (A to B; B to A) • Directed graphs – edges between nodes only go in one direction (A to B is different than B to A) • Ex: WWW

  9. Web pages = nodes Hyperlinks = edges Spiders & Web crawlers updating Kleinberg’s Algorithm Hub – a page that links to many authorities Authority – a page that is linked to by many hubs Google – Directed Graph Example

  10. Google – example continued • Authority versus mere popularity • Rank by number of unrelated sites linking to a site yields popularity • Rank by number of subject-related hubs that point to them yields authority • Helps to overcome the situation that often arises in popularity where the real authority (eg Home Page) is ranked lower because of lack of popularity of links to it

  11. Examples of Link Analysis • Recent Int’l Data Mining Conference • http://www.siam.org/meetings/sdm04/ • Chapter10-Example1.pdf • Chapter10-Example2.pdf • Chapter10-Example3.pdf • Megaputer (PolyAnalyst vendor) page: • http://www.megaputer.com/products/pa/algorithms/la.php3

  12. End of Chapter 10

More Related