1 / 36

The Range of Webometrics: Forms of Digital Social Utility as Tools

Explore the various forms of webometrics and their applications in understanding information resources, structures, and technologies on the web. Learn about link topology, search engine analysis, web mining, and more.

mrex
Télécharger la présentation

The Range of Webometrics: Forms of Digital Social Utility as Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Range of Webometrics: Forms of Digital Social Utility as Tools Professor Peter Ingwersen, Ph.D. Information Interaction & Information Architecture Royal School of LIS, Denmark pi@iva.dk - http://www.iva.dk/pi

  2. Table of Contents • Webometrics – Cybermetrics – a framework • Link topology & structural conceptions in webometrics • Overview of potentials • Search engine analysis • Link analyses – Web Impact Factor (Web-IF) • Dataset Usage Indicators • Web mining – Trend analyses (blog contents) • Concluding remarks Ingwersen

  3. Webometrics • The study of quantitative aspects of the construction and use of information resources, structuresand technologieson the Web, drawing on bibliometric and informetric methods • search engine performance • link structures, e.g., WIFs, cohesiveness of link topologies, etc. • users’ information behaviour (searching, browsing, etc.) • web page contents – knowledge mining – blog trends • Dataset analyses & impact • cybermetrics: quantitative studies of the whole Internet • i.e. chat, mailing lists, news groups, MUDs, etc. - and WWW Ingwersen • Lennart Björneborn 2001

  4. L. Björneborn & P. Ingwersen 2003 infor-/biblio-/sciento-/cyber-/webo-/metrics informetrics bibliometrics scientometrics cybermetrics webometrics Ingwersen

  5. corona model (Björneborn 2004) SCC Strongest Connected Component OUT reachable from SCC IN traversable to SCC Disconnected IN-Tendrils connected from IN Tube connecting IN to OUT OUT-Tendrils connected to OUT Ingwersen

  6. Source: www.cybergeography.org Ingwersen

  7. L. Björneborn & P. Ingwersen 2003 Link terminologybasic concepts A B E G C D F • B has an outlink to C; outlinking : ~ reference • B has an inlink from A; inlinked : ~ citation • B has a selflink; selflinking : ~ self-citation • A has no inlinks; non-linked: ~ non-cited • E and F are reciprocally linked • A is transitively linked with H via B – DH is reachable from A by a directed link path • A has a transversal link to G : short cut • C and D are co-linked from B, i.e. have co-inlinks orshared inlinks: co-citation • B and E are co-linking to D, i.e. have co-out-links orshared outlinks: bibliog.coupling H co-links Ingwersen

  8. d b a c Levels of web nodes • Lennart Björneborn 2002 • 3 basic levels of web nodes: pages , sites, TLDs • different levels of selflinks and outlinks • a = page selflink • b = page outlink andsite selflink • c = site outlink and TLD selflink • d = TLD outlink • more levels: frames (page sections), sub-sites, sub-TLDs ... Ingwersen

  9. Search engine analyses • See e.g. Judith Bar-Ilan’s excellent longitudinal analyses • Mike Thelwall et al. in several case studies • Scientific material on the Web: • Lawrence & Giles (1999):approx. 6 % of Web sites contains scientific or educational contents • Increasingly:the Web is a web of uncertainty • Allen et al. (1999) – biology topics from 500 Web sites assessed for quality: • 46 % of sites were ”informative” – but: • 10-35 % inaccurate; 20-35 % misleading • 48 % unreferenced Ingwersen

  10. http://searchenginewatch.com/3634992 Ingwersen

  11. Ingwersen

  12. www.internetworldstats.com 12 Ingwersen Knoxville 2010

  13. Possible types of Web-IF: E-journal Web-IF Calculated by in-links Calculated as traditional JIF (citations) Scientific web site – IF (by link analyses) National – regional (some URL-problems in TDL) Institutions – single sites Other entities, e.g. domains Best nominator:no. of staff, beds – or simply use external inlinks (Thelwall et al., 2002) Blog IF: no. of external inlinks / blog entries Twitter IF: no of external inlinks / twitter entries (Holmberg, 2009) 13 Ingwersen Knoxville 2010

  14. The only valid webometric tool: Site Explorer Yahoo Search … • If one enters (old valid) commands like: • Link:URL or Domain: topdomain (edu, dk) or Site:URL you are transferred to: http://siteexplorer.search.yahoo.com/new/ • Or find it via this URL • The same facilities are available in click-mode, as one starts with a given URL: • Finding ‘all’ web pages in a site • Finding ‘all’ inlinks to that site/those pages • Also without selflinks! – this implies … Ingwersen

  15. … to calculate Web Impact Factors • But one should be prudent in interpretations. • Note that external inlinks is the best indicator of recognition (see sample) • Take care of how many sub-domains (and pages) that are included in the click analysis. • Results can be downloaded Ingwersen

  16. Consequences for Yahoo Site Expl. Take care on which domain-level you are: www.yahoo.com does not contain sub-domains like maps.yahoo.com – only those below its name directly. Yahoo.com will thus contain maps… Also beware of the path structure Minor tests show that probably the inlink no. really implies inlinks – not inlinking web pages. Ingwersen 16 2010 Åbo

  17. Search sample: www.db.dk/pi Ingwersen

  18. Without selflinks … Ingwersen

  19. The Web-Impact Factor Ingwersen, 1998 • Intuitively (naively?) believed as similar to the Journal Impact Factor • Demonstrate recognition by other web sites - or simply impact – notnecessarilyquality • Central issue: are web sites similar to journals and web pages similar to articles? • Are in-links similar to citations – or simply road signs? • What is really calculated? • DEFINE WHAT YOU ARE CALCULATING: site or page IF Ingwersen

  20. Web-links like citations? • Kleinberg (1998) between citation weights and Google’s PageRank: Hubs~ review article: have many outlinks (refs) to: Authority pages~ influential (highly cited) documents: have many inlinks fromHubs! Typical: Web index pages =homepage with self-inlinks = Table of contents Ingwersen

  21. Reasons for outlinking … • Out-links mainly for functional purposes • Navigation – interest spaces… • Pointing to authority in certain domains? (Latour:rhetoric reasons for references-links) • Normative reasonsfor linking? (Merton) • Do we have negative links? • We do have non-linking (commercial sites) Ingwersen

  22. Some additional reasons for providing links In part analogous to providing references (recognition) And, among others, • emphasising the own position and relationship (professional, collaboration, self-presentation etc.) • sharing knowledge, experience, associations … • acknowledging support, sponsorship, assistance • providing information for various purposes (commercial, scientific, education, entertainment) • drawing attention to questions of individual or common interest and to information provided by others (the navigational purpose) Ingwersen

  23. Other differences between references, citations & links • The time issue: • Agingof sources are different on the Web: • Birth, Maturity & Obsolescence happens faster • Decline & Death of sources occur too– but • Mariages – Divorse – Re-mariage – Death & Resurrection…& alike liberalphenomena are found on the Web! (Wolfgang Glänzel) Ingwersen

  24. Dataset usage indicators: a novelwebometric approach • Biodiversity datasets are: • Searchable • Downloadable … in • Open access • See e.g. GBIF websiteand 2009 publication:Vishwas S Chavan and Peter Ingwersen, BMC Bioinformatics, 2009, 10(Suppl 14):S2 Ingwersen

  25. Example:Denmark – GBIF dataset providers • DanBioInfoFacility – many datasets • HerbariumUA: only two datasets • Comparable US dataset provider: • OBIS – Ocean Bio Info System Ingwersen

  26. DanBIF distribution of datasets – sampleselectionsorted by Search Events Ingwersen, P. & Vishwas, C. (under review): INDICATORS FOR A DATA USAGE INDEX: AN INCENTIVE FOR PUBLISHING PRIMARY BIODIVERSITY DATA THROUGH AGLOBAL INFORMATION INFRASTRUCTURE.BMC Bioinformatics. Ingwersen

  27. Sample of Dataset Usage Indicators (DUI) Ingwersen

  28. Issue tracking – Web mining • Adequate sampling requires knowledge of the structure and properties of the population- the Web space to be sampled • Issue trackingof known properties / issues may help • Web mining the unknown is more difficult, due to • the dynamic, distributed & diverse nature • the variety of actors and minimum of standards • the lack of quality control of contents • Web archeology – study of the past Web Ingwersen

  29. Nielsen Blog Pulse – social utility indicator • Observes blogs worldwide by providing: • Trend search– development over time of terms/concepts – user selection! • Featured trends– predefined categories • Coversation tracker– blog conversations • BlogPulse profiles– blog profiles • Look into: http://www.blogpulse.com/tools.html Ingwersen

  30. Home > ToolsTrend Search Ingwersen

  31. Informetric methods useful • Co-occurrence analyses (terms; names…) • Co-link and co-linking analyses • Bradford-like (skewed) distributions of links probably found in sectors of web space • In order to define the strong ties…between top frequency web objects in two sectors of topical difference • Weak (low frequency) ties – Small-Worlds – Serendipity between objects in the two sectors: UNEXPECTED relations may occur Ingwersen

  32. Source: www.cybergeography.org Weak Tie! Ingwersen

  33. Concluding remarks • One may be somewhat cautious on Web-IF applications without careful sampling via robotsdue to its incomprehensiveness and what it actually signifies • One might also try to investigate more the behavioural aspects of providing and receiving linksto understand what the impact might mean and how/why links are made • Better to understand the Web space information structure • Design workable robots, downloading & local analyses • Move into the social media and open access genres with social utility indicators Ingwersen

  34. Concluding remarks - 2 • Issue tracking and web mining are:applications of Web IR / Webometrics • Combined IR and informetric methods seem promissing: • co-occurrence analyses – mapping - clustering • co-links and co-linking - transversal links • Knowledge discovery and use in diversified web spaces Ingwersen

  35. Additional References • Adamic, L. (1999). The small world Web. Lecture Notes in Computer Science, 1696: 443-452. • Almind, T.C. And Ingwersen, P. Informetric analyses on the World Wide Web: Methodological approaches to ”Webometrics”. Journal of Documentation, 53 (1997), 404-426. • Björneborn, L. (2001). Small-world linkage and co-linkage. Proceedings of the 12th ACM Conference on Hypertext, pp. 133-134. • Björneborn, L. & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1): 65-82. • Björneborn, L. & Ingwersen, P. (2004). Towards a basic framework of webometrics. (submitted) • Broder, A. et al. (2000). Graph structure in the Web. Computer Networks, 33(1-6): 309-320. • Chakrabarti, S. et al. (1999). Mining the Web’s link structure. IEEE Computer, 32(8): 60-67. • Chavan, Vishwas S. & Ingwersen, P. (2009). Towards a data publishing framework for primary biodiversity data. BMC Bioinformatics, 10(Supp. 14): S2 • Granovetter, M.S. (1973). The strength of weak ties. American Journal of Sociology, 78(6): 1360-1380. Ingwersen

  36. References -2 • Ingwersen, P. The calculation of Web Impact Factors. Journal of Documentation, 54 (1998), 236-243 • Kousha, K. & Thelwall, M. (2007). How is Science cited on the Web? A classification of Google unique Web citations. Journal of American Society for Information Science and Technology, 58(11): 1631-1644. • Matthews, R. (1998). Six degrees of separation. New Scientist, June 6. • Newman, M.E.J. (2001). The structure of scientific collaboration networks. PNAS, 98(2): 404-409. • Rousseau, R. Daily time series of common single word searches in AltaVista and Northern Light. Cybermetrics, 2/3, paper 2. ISSN: 1137-5019. (http://www.cindoc.csic.es/cybermetrics/articles/v2ilp2.html) • Small, H. (1999). A passage through science: crossing disciplinary boundaries. Library Trends, 48(1): 72-108. • Swanson, D.R. (1986). Undiscovered public knowledge. Library Quarterly, 56(2): 103-118. • Thelwall, M. Web impact factors and search engine coverage. Journal of Documentation, 56 (2000), 185-189. • Watts, D. J. & Strogatz, S.H.(1998). Collective dynamics of ‘small-world’ networks. Nature, 393 (June 4): 440-442. Ingwersen

More Related