1 / 29

Chapter 1 Introduction to Data Mining

Chapter 1 Introduction to Data Mining. Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010. Data Mining Class. This class is an introduction to a young and promising filed called data mining and knowledge discovery from data. Outline. What Motivated Data Mining?

Télécharger la présentation

Chapter 1 Introduction to Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1 Introduction to Data Mining Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010

  2. Data Mining Class • This class is an introduction to a young and promising filed called data mining and knowledge discovery from data

  3. Outline What Motivated Data Mining? So, What Is Data Mining? What kind of patterns can we mined?

  4. What Motivated Data Mining? • Necessity is the mother of invention – Plato • The Explosive Growth of Data: from terabytes to petabytes • Data collection and data availability • Major sources of abundant data

  5. What Motivated Data Mining? • Data collection and data availability • Automated data collection tools, database systems, Web, computerized society • Major sources of abundant data • Business: Web, e-commerce, transactions, stocks, … • Science: Remote sensing, bioinformatics, scientific simulation, … • Society and everyone: news, digital cameras, YouTube

  6. What Motivated Data Mining? • We are drowning in data, but starving for knowledge!

  7. Evolution of Database Technology • 1960s: • Data collection, database creation, IMS and network DBMS • 1970s: • Relational data model, relational DBMS implementation • 1980s: • RDBMS, advanced data models (extended-relational, OO, deductive, etc.) • Application-oriented DBMS (spatial, scientific, engineering, etc.)

  8. Evolution of Database Technology • 1990s: • Data mining, data warehousing, multimedia databases, and Web databases • 2000s • Stream data management and mining • Data mining and its applications • Web technology (XML, data integration) and global information systems

  9. Outline • What Motivated Data Mining? • So, What Is Data Mining? • What kind of patterns can we mined?

  10. So, What Is Data Mining? • Data mining (knowledge discovery from data) • Extraction of interesting(non-trivial,implicit, previously unknown and potentially useful)patterns or knowledge from huge amount of data • Data mining: a misnomer?

  11. So, What Is Data Mining? • Alternative names • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

  12. Knowledge Discovery (KDD) Process Knowledge Pattern Evaluation • Data mining—core of knowledge discovery process Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

  13. Knowledge Process • Data cleaning – to remove noise and inconsistent data • Data integration – to combine multiple source • Data selection – to retrieve relevant data for analysis • Data transformation – to transform data into appropriate form for data mining • Data mining • Evaluation • Knowledge presentation

  14. Knowledge Process • Step 1 to 4 are different forms of data preprocessing • Although data mining is only one step in the entire process, it is an essential one since it uncovers hidden patterns for evaluation

  15. Knowledge Process • Based on this view, the architecture of a typical data mining system may have the following major components: • Database, data warehouse, world wide web, or other information repository • Database or data warehouse server • Data mining engine • Pattern evaluation model • User interface

  16. Data Mining and Business Intelligence Increasing potential to support business decisions End User DecisionMaking Business Analyst Data Presentation Visualization Techniques Data Mining Data Analyst Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses DBA Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems

  17. Database Technology Statistics Data Mining Visualization Machine Learning Pattern Recognition Other Disciplines Algorithm Data Mining: Confluence of Multiple Disciplines

  18. Data Mining – on what kind of data? • Relational Database

  19. Data Mining – on what kind of data? Data Warehouses

  20. Data Mining – on what kind of data? • Transactional Databases • Advanced data and information systems • Object-oriented database • Temporal DB, Sequence DB and Time serious DB • Spatial DB • Text DB and Multimedia DB • … and WWW

  21. Outline • What Motivated Data Mining? • So, What Is Data Mining? • What kind of patterns can we mined?

  22. What kind of patterns can we mined? • In general, data mining tasks can be classified into two categories: descriptive and predictive • Descriptive mining tasks characterize the general properties of the data in database • Predictive mining tasks performs inference on the current data in order to make predictions

  23. Mining frequent patterns, Associations, and Correlations (Ch4) • Frequent patterns are patterns that occur frequently in data • Association analysis: • Example: buys(X,”computer”) => buys(X,”software”) [support = 1%, confidence = 50%]

  24. Classification and Prediction (Ch 5) • Classification is the process of finding a MODEL that describes and distinguish data classes or concepts

  25. Cluster analysis (Ch 6) • In general, the class label are not present in the training data simply they are not known to begin with • The objects are clustered or grouped based on the principle of maximizing the intra-cluster similarity and minimizing the inter-cluster similarity

  26. Cluster analysis

  27. Outlier Analysis (Ch 7) • Most data mining methods discard outliers as noise or exceptions. • However, in some application such as fraud detection, the rare event can be more interesting than regularly occurring ones

More Related