170 likes | 295 Vues
This research addresses the challenge of excessive search engine citations for proper noun queries, such as "Bonnie Lake," which can yield upwards of 800 citations. The issue arises from multiple sources referencing the same object, leading to cluttered results. Our approach partitions citations into two groups: those of the chosen kind and those that do not match the required criteria. By utilizing attributes, links, and page similarity, we maintain a best-first ranking while improving relevance. The objective is to refine search results, enhancing user experience and accuracy.
E N D
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha Supported by NSF
The Problem • Search engines return too many citations • Example: “Bonnie Lake” • Google returns around 800 citations • Citations ranked best first • Many refer to the same object • Can we partition by same object? • Proper Noun Queries • Discard citations not of the right kind • Partition the rest by same object • Retain the best-first ranking
Solution • Classification • Group 1: those of the chosen kind • Group 2: those not of the chosen kind • Partition • Three facets • Attributes • Links • Page Similarity • Sub-facets for each facet • Confidence Matrix for each sub-facet • (Weighted) Mean for each facet • Final Confidence Matrix
Attributes • Attribute(s) (One-to-One) Latitude and longitude • Single Attribute (Functional Determination) Province with a lake’s name • Multiple Attributes (Functional Determination) Campground name and highway with a lake’s name • Attributes (Nonfunctional Determination) Country with a lake’s name • Distinguishing Attribute State for a lake
Links • Returned citations that link together • Returned citations that have a common URL prefix: same Host, same File name, and same URL. example of Host: http://www.cs.byu.edu/info/dwembley.html http://www.cs.byu.edu/info/directory.php example of File: http://sunsite.unc.edu/javafaq/oldnews.html http://helios.oit.unc.edu/javafaq/oldnews.html
Confidence Matrix for Returned Citations that Link Together 1 4
Page Similarity • Similarity between each two returned citations • Similarity between two citations-referenced documents
Confidence Matrix for Similarity between two Citation-Referenced Documents
Modified Confidence Matrix for Similarity between two Citation-Referenced Documents
1,4 3,5 5,8 7,8 Final Matrix {1,4} {3,5,7,8} {2} {6}
Measurements • Classification ( Percent correctly classified) • Number of Partitions (Precision and Recall) • Each Partition (Precision and Recall)
Current Implementation Status • Interface • Google connection • Citations retrieval • Page retrieval
Contribution • Solve one type of object-identity problem • Provide an additional tool for search engine queries