1 / 17

Partitioning Search-Engine Returned Citations for Proper-Noun Queries

This research addresses the challenge of excessive search engine citations for proper noun queries, such as "Bonnie Lake," which can yield upwards of 800 citations. The issue arises from multiple sources referencing the same object, leading to cluttered results. Our approach partitions citations into two groups: those of the chosen kind and those that do not match the required criteria. By utilizing attributes, links, and page similarity, we maintain a best-first ranking while improving relevance. The objective is to refine search results, enhancing user experience and accuracy.

morrison
Télécharger la présentation

Partitioning Search-Engine Returned Citations for Proper-Noun Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF

  2. The Problem • Search engines return too many citations • Example: “Bonnie Lake” • Google returns around 800 citations • Citations ranked best first • Many refer to the same object • Can we partition by same object? • Proper Noun Queries • Discard citations not of the right kind • Partition the rest by same object • Retain the best-first ranking

  3. “Bonnie Lake” Query to Google

  4. The Interface

  5. “Bonnie Lake” Query Result

  6. Solution • Classification • Group 1: those of the chosen kind • Group 2: those not of the chosen kind • Partition • Three facets • Attributes • Links • Page Similarity • Sub-facets for each facet • Confidence Matrix for each sub-facet • (Weighted) Mean for each facet • Final Confidence Matrix

  7. Attributes • Attribute(s) (One-to-One) Latitude and longitude • Single Attribute (Functional Determination) Province with a lake’s name • Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name • Attributes (Nonfunctional Determination) Country with a lake’s name • Distinguishing Attribute State for a lake

  8. Links • Returned citations that link together • Returned citations that have a common URL prefix: same Host, same File name, and same URL. example of Host: http://www.cs.byu.edu/info/dwembley.html http://www.cs.byu.edu/info/directory.php example of File: http://sunsite.unc.edu/javafaq/oldnews.html http://helios.oit.unc.edu/javafaq/oldnews.html

  9. Confidence Matrix for Returned Citations that Link Together 1 4

  10. Page Similarity • Similarity between each two returned citations • Similarity between two citations-referenced documents

  11. Confidence Matrix for Similarity between two Citation-Referenced Documents

  12. Modified Confidence Matrix for Similarity between two Citation-Referenced Documents

  13. 1,4 3,5 5,8 7,8 Final Matrix {1,4} {3,5,7,8} {2} {6}

  14. “Bonnie Lake”—Results

  15. Measurements • Classification ( Percent correctly classified) • Number of Partitions (Precision and Recall) • Each Partition (Precision and Recall)

  16. Current Implementation Status • Interface • Google connection • Citations retrieval • Page retrieval

  17. Contribution • Solve one type of object-identity problem • Provide an additional tool for search engine queries

More Related