1 / 22

Introduction to Data Mining

Introduction to Data Mining. Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu. Introduction to Data Mining. Definition General Concept Foundations Evolution Applications Challenges Algorithms Classical Next Generations. Introduction to Data Mining.

jag
Télécharger la présentation

Introduction to Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Data Mining • Group Members: • Karim C. El-Khazen • Pascal Suria • Lin Gui • Philsou Lee • Xiaoting Niu

  2. Introduction to Data Mining • Definition • General Concept • Foundations • Evolution • Applications • Challenges • Algorithms • Classical • Next Generations

  3. Introduction to Data Mining • What is Data Mining? • Data mining is the process for the non-trivial extraction of implicit, previously unknown and potentially useful information from data stored in repositories using pattern recognition technologies as well as statistical and mathematical methods.

  4. Introduction to Data Mining • Foundations • Massive data collection • Powerful multiprocessor computers • Data mining algorithms

  5. Introduction to Data Mining • Evolution

  6. Introduction to Data Mining • Applications • Industry • Retails • Health maintenance group • Telecommunications • Credit card • Web mining • Sports and entertainment solutions

  7. Introduction to Data Mining • Challenges • Ability to handle different types of data • Graceful degeneration of data mining algorithms • Valuable data mining results • Representation of data mining requests and results • Mining at different abstraction levels • Mining information from different sources of data • Protection of privacy and data security

  8. Introduction to Data Mining • Hierarchy of Choices and Decisions • Business goal • Collecting, cleaning and preparing data • Prediction • Model type and algorithms

  9. Introduction to Data Mining • Data Description • Descriptions of data characteristics in elementary and aggregated form • Summarization • Visualization

  10. Introduction to Data Mining • Predictive Data Mining • Predictive modeling is a term used to describe the process of mathematically or mentally representing a phenomenon or occurrence with a series of equations or relationships.

  11. Introduction to Data Mining • Prediction: Classification • Classification predicts class membership • Pre-classify (using classification algorithms) • Test to determine the quality of the model • Predict (using effective classifier)

  12. Introduction to Data Mining • Prediction: Regression • Regression takes a numerical dataset and develops a mathematical formula that fits the data.  • When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you get a prediction! 

  13. Introduction to Data Mining • Algorithms • Classical Techniques • Statistics • Neighborhoods • Clustering • Next Generations • Decision Tree • Neural Network • Rule Induction

  14. Introduction to Data Mining • Statistics • Classical Statistics: • Related to the collection and description of data • Believes: there exists an underlying pattern of data distribution • Objective: find the best guess • Data Mining: • Employs statistical methods • Needs to analyze huge amounts of data • Beyond traditional statistics

  15. Introduction to Data Mining • Neighborhoods • Basic idea: • For a new problem, look for the similar problems (neighborhoods) that have been solved • Key point: find the neighborhood • Calculate the distance: how far is good to be considered as a neighbor? • Which class the new problem belong to? • Large computational load: • New calculation for each new case

  16. Introduction to Data Mining • Clustering • Elements grouped together according to different characteristics • Every cluster share same values (homogenous) • Problem: Control the number of cluster • Hierarchical clustering: flexibility • Non-hierarchical clustering: given by user • Used most frequently for: • Consolidating data into a high-level of view • Group records into likely behaviors

  17. Introduction to Data Mining • Decision Tree • A way of representing a series of rules that lead to a class or value • Structure: • Decision node, branches, leaves • Example: A loan officer wants to determine the credit of applicants

  18. Introduction to Data Mining • Decision Tree (continued) • Help to induce the tree and its rules to make predictions

  19. Introduction to Data Mining • Neural Networks • Efficiently modeling large and complex problems with hundreds of predictor variables • Structure: • Input layer, hidden layer, output layer • Activation function between nodes • Requires training and testing of relations

  20. Introduction to Data Mining • Neural Networks (continued) • Example:

  21. Introduction to Data Mining • Rule Induction • A method to derive a set of rules to classify cases • For example, rule induction can be used to discover patterns relating decisions (e.g., credit card application) • Rules may not cover all possible situations

  22. Introduction to Data Mining Questions

More Related