200 likes | 229 Vues
Welcome! Knowledge Discovery and Data Mining. Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk. Data Mining: An Example. You are a marketing manager for a brokerage company Problem: Churn is too high (also known as Attrition)
E N D
Welcome!Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk Course Introduction
Data Mining: An Example • You are a marketing manager for a brokerage company • Problem: Churn is too high (also known as Attrition) • Turnover (after six month introductory period ends) is 40% • Customers receive incentives (average cost: $160) when account is opened • Giving new incentives to everyone who might leave is very expensive (as well as wasteful) • Bringing back a customer after they leave is both difficult and costly Course Introduction 2
…A Solution • One month before the end of the introductory period is over, predict which customers will leave • If you want to keep a customer that is predicted to churn, offer them something based on their predicted value • The ones that are not predicted to churn need no attention • If you don’t want to keep the customer, do nothing • How can you predict future behavior? • Build models • Test models Course Introduction 3
Convergence of Three Technologies Course Introduction 4
Why Now? 1. Increasing Computing Power • Moore’s law doubles computing power every 18 months • Powerful workstations became common • Cost effective servers (SMPs) provide parallel processing to the mass market Course Introduction 5
2. Improved Data Collection • Data Collection Access Navigation Mining • The more data the better (usually) Course Introduction 6
3. Improved Algorithms (AI + Data Base) • Techniques have often been waiting for computing technology to catch up • Statisticians already doing “manual data mining” • Good machine learning = intelligent application of statistical processes • A lot of data mining research focused on tweaking existing techniques to get small percentage gains Course Introduction 7
Definition: Predictive Model • A “black box” that makes predictions about the future based on information from the past and present • Large number of inputs usually available Course Introduction 8
How are Models Built and Used? • View from 20,000 feet: Course Introduction 9
The Data Mining Process Course Introduction 10
What the Real World Looks Like Course Introduction 11
Predictive Models are… • Decision Trees • Nearest Neighbor Classification • Neural Networks • Rule Induction • K-means Clustering Course Introduction 12
Data Mining is Not ... • Data warehousing • SQL / Ad Hoc Queries / Reporting • Software Agents • Online Analytical Processing (OLAP) • Data Visualization Course Introduction 13
Common Uses of Data Mining • Marketing: • Direct mail marketing • Web site personalization • Fraud Detection • Credit card fraud detection • Science • Bioinformatics • Gene analysis • Web & Text analysis • Google Course Introduction 14
Course Description • Data Mining and Knowledge Discovery • Focus: • Focus 1: Theoretical foundations in Pattern Recognition and Machine Learning • Algorithms: • Differences? • where they apply? • Focus 2: Broad survey of recent research • Focus 3: Hands-on, apply algorithms to KDD data sets Course Introduction
Topic 1: Foundations • Classification algorithms • Clustering algorithms • Association algorithms • Sequential Data Mining • Novel Applications • Web • Customer Relationship Management • Biological Data Course Introduction
Topic 2: Hands On • Apply learned algorithms to selected data sets • Get familiar with existing software packages and libraries • Final Project will involve working with some datasets Course Introduction
Prerequisites • Statistics and Probability would help, • but not necessary • Pattern Recognition would help, • but not necessary • Databases • Knowledge of SQL and relational algebra • But not necessary • One programming language • One of Java, C++, Perl, Matlab, etc. • Will need to read Java Library Course Introduction
Grading • Grade Distribution: • Assignments (30%) • Midterm Exam: 30% • Paper Presentation and Presentation: 40% Course Introduction