1 / 21

Quantitative Comparisons of Search Engine Results

Quantitative Comparisons of Search Engine Results. Mike Thlwall School of Computing and Information Technology, University of Wolverhampton ( 伍爾弗漢普頓 UK ) Journal of the American Society for Information Science and Technology 2008. Abstract. Search engines

jeri
Télécharger la présentation

Quantitative Comparisons of Search Engine Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quantitative Comparisons of Search Engine Results Mike Thlwall School of Computing and Information Technology, University of Wolverhampton (伍爾弗漢普頓 UK) Journal of the American Society for Information Science and Technology 2008

  2. Abstract • Search engines • To find information or web sites • Webometric • Finding and measuring web based phenomena • Comparing the applications programming interfaces • Google, Yahoo!, Live Search • Webometric application • hit count, number of URLs, number of domains, number of web sites, number of top-level domains

  3. Search Engine and Web Crawlers • Three key operations: • Crawling : identifying, downloading and storing to DB • Results matching: a search engine identifies the pages in its database that match any user query.

  4. Search Engine and Web Crawlers • Results ranking • A search engine will arrange the matching URLs to maximize the probability that a relevant result is in the first or second pages. • Search term • Occur frequency • Number of click

  5. Research Objectives • Are there specific anomalies that make the HCEs of Google, Live Search or Yahoo! unreliable for particular values? • How consistent are Google, Live Search and Yahoo! in the number of URLs returned for a search, and which of them typically returns the most URLs? • How consistent are the search engines in terms of the spread of results (sites, domains and top-level domains) and which search engine gives the widest spread of results for a search?

  6. Data • 1,587 words • Blogs • Word frequency • http://cybermetrics.wlv.ac.uk/paperdata/ • Three engine searchs • Google, Yahoo! and Live Search • 1000 pages • Five webometrics • hit count, number of URLs, number of domains, number of web sites, number of top-level domains

  7. Results - 1 • Hit count estimates Figure 2a,b,c. Hit count estimates of the three search engines compared (logarithmic scales, excluding data with zero values; r=0.80, 0.96, 0.83).

  8. Results - 2 • Number of URLs returned Figure 3a,b,c. URLs returned by the three search engines compared (r=0.71, 0.68, 0.84)

  9. Results - 3 • Number of domains returned Figure 4a,b,c. Domains returned by the three search engines compared (r=0.65, 0.69, 0.83).

  10. Results - 4 • Number of sites returned Figure 5a,b,c. Sites returned by the three search engines compared (r=0.66, 0.69, 0.81)

  11. Results - 5 • Number of TLDs returned Figure 6a,b,c. TLDs returned by the three search engines compared (r=0.74, 0.77, 0.84)

  12. Results - 6 • Comparison within results

  13. Conclusion • Google seems to be the most consistent in terms of the relationship between its HCEs and number of URLs returned. • Yahoo! is recommended if the objective is to get results from the widest variety of web sites, domains or TLDs.

  14. Evaluating Search Engine Effects on Web-based Relatedness Measurement

  15. Snippets • Six manifest records • snippets • hit count • number of URLs • number of domains • number of web sites • number of top-level domains

  16. Dataset • WordSimilarity-353 Test Collection (TC-353) • TC353 Full (353 pairs) • TC353 Testing (153 pairs) • Three famous search engines • Yahoo! • Google • Live Search • Five domains • general web search (web09) • .Com • .Edu • .Net • .Org

  17. The Model • A web-based relatedness WebMetric(X, Y) measures the association of two objects X and Y • where F is a transfer function and d is a dependency score. • The dependency score d reflects a mutual dependency of X and Y on the web. WebMetric(X, Y)= F(d(X,Y))

  18. The Model • Given a search engine G and two objects X and Y • we employ two double-checking functions, fG(Y@X) and fG(X@Y), to estimate the dependence between X and Y WebMetric(X, Y) =

  19. Figure 8. Behaviors of the Gompertz Curve and a Mapping Example

  20. Experiments WebMetric(X, Y) =

More Related