1 / 24

Informetrics, Webometrics and Web Use metrics

Informetrics, Webometrics and Web Use metrics. Huimin Lu 10/21/2004. Outline. History. Article 1: Bibliometrics & WWW. Article 2: Bibliometrics of the WWW. Article 3: Authoritative Sources. Article 4: ParaSite. Conclusion. History. Term introduced by Pritchard in 1969.

oreynolds
Télécharger la présentation

Informetrics, Webometrics and Web Use metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Informetrics, Webometrics and Web Use metrics Huimin Lu 10/21/2004

  2. Outline History Article 1: Bibliometrics & WWW Article 2: Bibliometrics of the WWW Article 3: Authoritative Sources Article 4: ParaSite Conclusion

  3. History Term introduced by Pritchard in 1969. Pritchard’s explanation: “the application of mathematical and statistical methods to books and other media of communication”.

  4. A1: Bibliometrics and the World Wide Web By Don Turnbull Bibliometrics Bibliometric laws Apply bibliometric to WWW Metrics design

  5. A1: Bibliometrics Classic citation analysis Refined classic bibliometrics- Standard formula for impact: n journal citations / n citable articles published- Basic formula for immediacy index of influence: n citations received by article during the year / total number of citable articles published Bibliometric Coupling - Measure the number of references two papers have in common to test for similarity Cocitation Analysis - Measure the relations between cited documents Common Errors - multiple authors lost, self-citation, similar author names, human error, etc.

  6. A1: Bibliometric Laws • Bradford’s Law of Scattering • - clustering method: Ran (n from 0; a<1), sum = R/(1-a) • Lotka’s Law • - inverse square • Zipf’s Law • - familiar words with high frequency (nth word: k/n times)

  7. A1: Applying Bibliometric to Web • Web surveys • - Georgia Tech Graphics, Visualization, and Usability Web Surveys • Web servers • Add programming logic • - Inaccurate data gathered: skip standard procedures, miss state information between usage hits, server hits themselves don’t represent true usage.

  8. A1: Metrics Design Configure Web server to gather comprehensive metrics Manage log files - Enhence reliability: regular backup, store log file analysis results and logs, begin new logs timely, post results and log information for comparasion. - Log analysis tools: Analog, WWWStat, GetStats, Perl Scripts. - Standardization: Extended Log File Format by WWW Consortium Standards Committee Downie’s attempt analysis: user-based, request, byte-based Optimal Web content setup & External bibliometric gathering

  9. A2: Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace By Ray R. Larson Analysis of 30G Web pages collected by Inktomi “Web Crawler” Cocitation analysis using DEC AltaVista search engine

  10. A2: Growth and Usage of Web WWW

  11. A2: Cocitation Analysis of Web Attempt: Map the intellectual structure of Web Question: Can cocitation techniques be applied to charting the contents of cyberspace?

  12. A2: Methods Selection of core set of items for study Retrieval of cocitation frequency information Compilation of the raw cocitation frequency matrix Correlation analysis to convert the raw frequencies into correlation coefficients Multivariate analysis of the correlation matrix Interpretation of the resulting “map” and validation

  13. A2: Results

  14. A3: Authoritative Sources in a Hyperlinked Environment By Jon M. Kleinberg A new method for automatically extracting certain types of information about a hypermedia environment from its link structure.

  15. A3: Goal • Types of query search and problem • - Specific queries: scarcity problem • - Broad-topic queries: abundance problem • - Similar-page queries • Synthesize the unreliable information contained in the presence of individual links to provide a set of authoritative pages relevant to an initial query.

  16. A3: Common Approaches Only S - Define S to be the top k pages indexed by AltaVista - Rank pages according to their in-degree S -> T - Define same root set S - Grow S to a larger base set T - Rank pages by their in-degree

  17. A3: Their Approach Extract small core sets of community of hubs and authorities from T Authoritative pages - A novel type of quality measure of the document in hypermedia by algorithmic means. - Large in-degree & considerable overlap in sets of pages that point to them Hub Pages - have links to multiple relevant authoritative pages

  18. A3: Algorithm and Output Method: Iteratively propagates “authority weight” and “hub weight” across links of the web graph, converging simultaneously to steady states for both types of weights Output: a pair of sets (X, Y) (X: a small set of authorities, Y: a small set of hubs) referred by authors as community of hubs and authorities Claim: authoritative pages can be identified as belonging to dense bipartite communities in the link graph of the WWW via their algorithm.

  19. A4: ParaSite: Mining Structural Information on the Web By Ellen Spertus Varieties of link information on the Web How the web differs from conventional hypertext How the links can be exploited to build useful applications

  20. A4: Classical Hypertext vs. Web Classical hypertext - links don’t cross site even document boundaries - documents limited to a single topic - manual answers each question in exactly one place or in none - Hardly change Web - links can cross site and document boundaries - multiple topics permitted in one web page - an answer could appear any number of times on the web - constantly changing

  21. A4: Mining Links Naïve Link Geometry - A useful technique for finding pages on a given set of topics Hypertext Links example - Categorized into upward, downward, crosswise, and outward Directory Links - Directory structure relation in pages in the absence of hypertext links Structure within a Page - Page can be considered a tree of nodes, each with attached text and links embedded in the text Other - Domain names, relationships between concepts represented by words and phrases, paths traveled through Web sites by visitors

  22. A4: Application Finding Moved Pages - Exploiting hyperlinks - Exploiting directory links Finding Related Pages - Collaborative filtering - When searching for a related page with similar pages got, ParaSite can find the page (A) that has maximum links to the pages user got and return other pages referneced by A. A Person Finder

  23. Conclusion World Wide Web information increase exponentially and Internet architecture turns to be more complicated. Applying bibliometrics to the Web will help us control and manage web information wisely.

  24. Example of Hypertext Link Back to hypertext link

More Related