1 / 31

WEB MINING by NINI P SURESH

WEB MINING by NINI P SURESH. PROJECT CO-ORDINATOR Kavitha Murugeshan. OUTLINE. Introduction Data mining Vs Web mining Web mining subtasks Challenges Taxonomy Web content mining Web structure mining Web usage mining Applications. INTRODUCTION.

urick
Télécharger la présentation

WEB MINING by NINI P SURESH

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan

  2. OUTLINE • Introduction • Data mining Vs Web mining • Web mining subtasks • Challenges • Taxonomy • Web content mining • Web structure mining • Web usage mining • Applications

  3. INTRODUCTION Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources. The target of search engines is only to discover the resources on the web.

  4. INTRODUCTION • Needs for Web Mining • Narrowly searching scope • Low precision

  5. INTRODUCTION • Other Approaches • Database approach (DB) • Information retrieval • Natural language processing (NLP) • Web document community

  6. WEB MINING DEFENITION Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data.

  7. DATA MINING WEB MINING Extracting relevant information hidden in Web-related data, like hypertext documents on web • Extraction of useful patterns from data sources like databases, texts, web, images etc

  8. WEB MINING SUBTASKS • Resource finding • Information selection & preprocessing • Generalization • Analysis

  9. CHALLENGES • Search relevant information on web • Create knowledge • Personalization of Information • Learn patterns • Uniformity & standardisation

  10. CHALLENGES • Redundant Information • Noisy web • Monitoring changes • Sites providing Services • Privacy

  11. TAXONOMY Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Text Mining Web Multimedia Mining Gen. Access Pattern Track Personalized Usages Track Link Mining Internal Structure Mining URL Mining

  12. WEB CONTENT MINING • Discovering useful information & Analyses the content • Automatic process beyond keyword extraction • Approaches to restructure document content • Two groups of mining strategies

  13. WEB CONTENT MINING • Agent based Approach • Intelligent search agents • Information filtering/categorization • Personalized web agents

  14. WEB CONTENT MINING • Database Approach • Multilevel databases • Web query system

  15. WEB STRUCTURE MINING • Discovering structure information from web • Web graph : web pages as nodes & hyperlinks as edges

  16. WEB STRUCTURE MINING • Two algorithms for handling of links • PageRank • HITS

  17. WEB STRUCTURE MINING • PageRank • Metric for ranking hypertext documents • Depends on rank of pages pointing it • Iterative process

  18. WEB STRUCTURE MINING n : Number of nodes in graph Outdegree(q) : Number of hyperlinks on page q d : damping factor

  19. WEB STRUCTURE MINING • HITS • Iterative algorithm • Identify topic hubs & authorities • Input : search results returned by traditional text indexing technique

  20. WEB STRUCTURE MINING • Assigns weight to hub based on authoritiveness • Outputs pages with largest hub & authority weights

  21. WEB USAGE MINING • Extracting information from server logs • Discover user access patterns of Web pages • Decomposed into 3 subtasks Site Files Preprocessing Mining algorithms Pattern Analysis Interesting Rules, Patterns & Statistic Rules, Patterns & Statistic User session file Raw logs

  22. WEB USAGE MINING • Preprocessing • Data cleaning • User identification • User sessions identification • Access path supplement • Transaction identification

  23. WEB USAGE MINING • Pattern discovery • Statistical Analysis • Association Rules • Clustering analysis

  24. WEB USAGE MINING • Classification analysis • Sequential Pattern • Dependancy Modeling

  25. WEB USAGE MINING • Pattern Analysis • Eliminates irrelevant rules or patterns • Extract intresting patterns

  26. APPLICATIONS • Personalized Services • Improve website design • System Improvement • Predicting trends • Carry out intelligent buisness

  27. PROS • High trade volumes • Classify threats & fight against Terrorism • Establish better customer relationship • Increase profitability

  28. CONS • Invasion of Privacy • Discrimination by controversial attributes

  29. CONCLUSION • Rapidly growing area • Promising area of future research

  30. REFERENCE [1] http://en.wikipedia.org/wiki/Web mining [2] http://www.galeas.de/webimining.html [3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000. [4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition [5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97 [6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RE- SEARCH: A SURVEY, 2010 IEEE [7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition [8] Web mining: applications and techniques By Anthony Scime

  31. WEB MINING Thank You

More Related