1 / 23

What is web link mining? ?

Virtual Knowledge Studio (VKS). Information Studies. What is web link mining? ?. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. 1. Definition and scope. Link analysis is:

jules
Télécharger la présentation

What is web link mining? ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Knowledge Studio (VKS) Information Studies What is web link mining? ? Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

  2. 1. Definition and scope • Link analysis is: • mapping and measuring hyperlink networks for collections of web pages or sites • a flexible toolkit of methods and software rather than a field or single technique • A new source of information about: • relationships between people, organisations and information - via the web • the impact of information and ideas • Used in: • media studies, information science, politics, marketing, sociology

  3. Link Analysis: Motivation • Individual hyperlinks reflect concrete creation reasons such as connections between web page contents or creators • Counts of large numbers of hyperlinks may reflect wider underlying social processes • Links may reflect phenomena that have previously been difficult to study; e.g., • informal scholarly communication • informal news discussions • friendship patterns • “amateur” politics

  4. But link patterns vary by context… • Commercial web sites tend not to link much • Academic and government web sites link more • Disciplinary differences: e.g., History Web use is very low, Chemistry is very high • Individual projects/resources can have an enormous impact upon web sites • E.g. Arts web sites are often for specific exhibitions or for digital media projects • Links often not frequent enough to reliably reveal underlying patterns

  5. Link Type Definitions A B • Inlink – a hyperlink to a web page from anywhere • Site inlink – a hyperlink to a web page from a different web site • Outlink – a hyperlink from a web page to any other • Site outlink – a hyperlink from a web page to a page in a different site

  6. Indirect link types - colinks • Useful when direct links rare • Indirect connection • Co-inlinks • B and C co-inlinked • Co-outlinks • D and E co-outlinked A B C E D F Lennart Björneborn’s terminology

  7. What to count? • Links between individual pages • Links between entire web sites • Site A links to site B if any page in site A links to any page in site B A B

  8. 2. Link Networks – Methods • Draw a network diagram • LexiURL Searcher, Issue Crawler, SocSciBot (web networks) • Pajek, UCINET, NetMiner (generic networks) • About 10-50 sites/pages is recommended • Diagrams should reveal patterns in the data • Social Network Analysis statistics • E.g., density, degree centrality

  9. Direct link networks • Start with list of web sites (or pages) • Build from many linkdomain:A site:B Yahoo searches • Powerful and free way to scan the entire web for links! • Returns pages in web site B that link to web site A • Can be automated with LexiURL Searcher • Or use SocSciBot to crawl web sites and get links e.g., linkdomain:ox.ac.uk site:pku.edu.cn

  10. Direct links example unconnected universities removed arrows represent > 100 links (with Han Woo Park) Top ASEAN universities network

  11. Co-inlink networks • Start with a list of web sites or pages • Build from many linkdomain:A linkdomain:B -site:A -site:B Yahoo searches • can be automated in LexiURL Searcher • Suitable for commercial or competitive web sites that do not interlink • normally better than direct link diagrams • A web environment (co-inlink) network for a single web site • finds web sites that link to it • picks the top 50 web sites liked to by these web sites • draws a co-inlink diagram of these web sites

  12. The web environment of ZigZagMag Indirect links example

  13. Another example – no patterns but interesting

  14. 3. Link Impact - Methods • Inlink counts often used as an impact/visibility indicator • Impact = “The effect or impression of one thing on another”, “to have an effect” * • Compare links to web sites to assess which site/organisation has the most online impact * http://www.thefreedictionary.com/impact, definition 3

  15. Link Impact Reports • Standardised comparative analysis of the link impact of web sites • Example audit: • http://cybermetrics.wlv.ac.uk/audit/101/ • Similar reports can be created for non-link impact (citation impact) • http://cybermetrics.wlv.ac.uk/audit/books/

  16. Total impact example

  17. impact spread example

  18. 4. Tools • E.g., …

  19. Links to UK universities against their research productivity 5. Statistical analyses… The reason for the strong correlation is the quantity of Web publication, not its quality

  20. More statistical analyses… Universities tend to link to neighbours

  21. 6. Content analysis • Content analysis of random sample of links recommended to get context • Example of usefulness of content analysis results: • 90% of links between UK university sites relate to scholarly activity • But less than 1% are equivalent to citations • Link counts do not measure research but are a natural by-product of scholarly activity • Use link counts to track (an aspect of) communication

  22. 7. Summary • Link networks • To investigate relationship patterns within collections of web sites • Link impact • Compare impact of web sites using inlinks • Methods • Toolkit of visual and statistical methods • Specialist software like LexiURL Searcher & Issue Crawler • Use to investigate web phenomena or offline phenomena reflected online in web sites

  23. Books • Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool. • Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press. • Thelwall, M. (2004). Link analysis: An information science approach. San Diego: Academic Press. • http://lexiurl.wlv.ac.ukhttp://webometrics.wlv.ac.ukhttp://www.issuecrawler.net

More Related