1 / 18

Growing Parallel Paths for Entity-Page Retrieval

Growing Parallel Paths for Entity-Page Retrieval. Tim Weninger , Cindy Xide Lin, and Jiawei Han. Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL Work Submitted to VLDB'10. Problem: Entity Page Retrieval. Given: Reference page.

dior
Télécharger la présentation

Growing Parallel Paths for Entity-Page Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Growing Parallel Paths for Entity-Page Retrieval Tim Weninger, Cindy Xide Lin, and Jiawei Han Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL Work Submitted to VLDB'10

  2. Problem: Entity Page Retrieval Given: Reference page

  3. Problem: Entity Page Retrieval …Can We find Entity Pages of the same Type?

  4. Problem: Entity Page Retrieval …Can We find Entity Pages of the same Type?

  5. Definitions: Defn 1: Root to link path: ◊ - hrefX contains HTML-TABLE-TR1—TD-hrefX Defn 2: Parallel Links: Share a root to link path. i.e., lists of links Defn 3: Intra-page parallel paths: ◊ - hrefCǁ ◊ - hrefB ◊ - hrefCǁ ◊ - hrefX

  6. Definitions: Defn 5: Parallel Web site paths Share intra or inter-page parallel paths across multiple pages Defn 4: Inter-page parallel ◊ - hrefCin Page A ǁ ◊ - hrefWin Page B

  7. Properties of Parallel Paths Prop. 1: Equal Path Length Property: Parallel paths must contain the same number of pages. Prop. 2: Parallel Page Property: The test of two paths being in parallel is equivalent to the result of tests of respective pages. Prop. 3: Equal Page Length Property: Parallel paths must have the same number of nodes across pages.

  8. Properties of Parallel Paths Prop. 4: Divergent Path Property: Parallel Paths can extend through separate pages Prop. 5: Early Termination Property: The test of two paths can be terminated at the first occurrence of a dissimilar node

  9. Finding Paths Naive Method Can be very costly Growing Parallel Paths First find example path Then grow paths which are in parallel to the example Repeat with alternate paths This makes magic happen

  10. Repeating with alternate paths k-shortest paths Do k-shortest path search. Explore all of these paths Removing links After exploring a path remove the edges from the graph

  11. Interpreting the Output Side Effect of Repeating with Alternate paths Given: Jiawei Han Result: Jiawei Han 40 Cheng Zhai 38 Kevin Chang 38 Dan Roth 32 VikramAdve 4 Roy Campbell 3 …

  12. Interpreting the Output Side Effect of Path Finding What does the link labels on the path tell us about the entity First path People Faculty Jiawei Han Personal Site Second path Research Data Mining

  13. Experiments Top 25 CS Departments in US (according to US News) Find all professors United States Congress Find all senators, representatives, and committees UIUC only Find all courses Final all research groups Baseline Google’s find similar search (essentially TFIDF-type ranking)

  14. Results

  15. Results

  16. Results

  17. Conclusions and Future Work Given a reference page and an example entity type we can retrieve all entity pages of the same type Implications: We can use this for information integration Search, retrieval can be enhanced Shortcomings: Most errors due to incorrect list finding

  18. Questions?

More Related