Data Mining & Machine Learning Introduction

Data Mining & Machine LearningIntroduction Intelligent Systems Lab. Soongsil University Thanks to Raymond J. Mooney in the University of Texas at Austin, Isabelle Guyon

Artificial Intelligence (AI): Research Areas Learning Algorithms Inference Mechanisms Knowledge Representation Intelligent System Architecture Research Intelligent Agents Information Retrieval Electronic Commerce Data Mining Bioinformatics Natural Language Proc. Expert Systems Artificial Intelligence Application Rationalism (Logical) Empiricism (Statistical) Connectionism (Neural) Evolutionary (Genetic) Biological (Molecular) Paradigm

Symbolic AI Rule-Based Systems Connectionist AI Neural Networks Evolutionary AI Genetic Algorithms Molecular AI: DNA Computing Artificial Intelligence (AI): Paradigms

What is Machine Learning? Learning algorithm Trained machine TRAINING DATA Answer ? Query

Definition of learning • Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E Task, T Experience, E Task Learned Program Learning Program Program Performance Performance, P

What is Learning? • Herbert Simon: “Learning is any process by which a system improves performance from experience.”

Machine Learning • Supervised Learning • Estimate an unknown mapping from known input- output pairs • Learn fw from training set D={(x,y)} s.t. • Classification: y is discrete • Regression: y is continuous • Unsupervised Learning • Only input values are provided • Learn fw from D={(x)} s.t. • Clustering

Why Machine Learning? • Recent progress in algorithms and theory • Growing flood of online data • Computational power is available • Knowledge engineering bottleneck. Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task • Budding industry

Niches using machine learning • Data mining from large databases. • Market basket analysis (e.g. diapers and beer) • Medical records → medical knowledge • Software applications we can’t program by hand • Autonomous driving • Speech recognition • Self customizing programsto individual users. • Spam mail filter • Personalized tutoring • Newsreader that learns user interests

More data is generated: Bank, telecom, other business transactions ... Scientific data: astronomy, biology, etc Web, text, and e-commerce Trends leading to Data Flood

Big Data Examples • Europe's Very Long Baseline Interferometry (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session • storage and analysis a big problem • AT&T handles billions of calls per day • so much data, it cannot be all stored -- analysis has to be done “on the fly”, on streaming data

Largest databases in 2007 • Commercial databases: • AT&T: 312 TB • World Data Centre for Climate: 220 TB • YouTube: 45TB of videos • Amazon: 42 TB (250,000 full textbooks) • Central Intelligence Agency (CIA): ?

Data Growth In 2 years, the size of the largest database TRIPLED!

Machine Learning / Data Mining Application areas • Science • astronomy, bioinformatics, drug discovery, … • Business • CRM (Customer Relationship management), fraud detection, e-commerce, manufacturing, sports/entertainment, telecom, targeted marketing, health care, … • Web: • search engines, advertising, web and text mining, … • Government • surveillance, crime detection, profiling tax cheaters, …

Data Mining for Customer Modeling • Customer Tasks: • attrition prediction • targeted marketing: • cross-sell, customer acquisition • credit-risk • fraud detection • Industries • banking, telecom, retail sales, …

Customer Attrition: Case Study • Situation: Attrition rate at for mobile phone customers is around 25-30% a year ! • With this in mind, what is our task? • Assume we have customer information for the past N months.

Customer Attrition: Case Study Task: • Predict who is likely to attrite next month. • Estimate customer value and what is the cost-effective offer to be made to this customer.

Customer Attrition Results • Verizon Wireless built a customer data warehouse • Identified potential attriters • Developed multiple, regional models • Targeted customers with high propensity to accept the offer • Reduced attrition rate from over 2%/month to under 1.5%/month (huge impact, with >30 M subscribers)

Assessing Credit Risk: Case Study • Situation: Person applies for a loan • Task: Should a bank approve the loan? • Note: People who have the best credit don’t need the loans, and people with worst credit are not likely to repay. Bank’s best customers are in the middle

Credit Risk - Results • Banks develop credit models using variety of machine learning methods. • Mortgage and credit card proliferation are the results of being able to successfully predict if a person is likely to default on a loan • Widely deployed in many countries

Successful e-commerce – Case Study • Task: Recommend other books (products) this person is likely to buy • Amazon does clustering based on books bought: • customers who bought “Advances in Knowledge Discovery and Data Mining”, also bought “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” • Recommendation program is quite successful

Security and Fraud Detection - Case Study • Credit Card Fraud Detection • Detection of Money laundering • FAIS (US Treasury) • Securities Fraud • NASDAQ KDD system • Phone fraud • AT&T, Bell Atlantic, British Telecom/MCI • Bio-terrorism detection at Salt Lake Olympics 2002

Example Problem:Handwritten Digit Recognition • Handcrafted rules will result in large no.of rules and Exceptions • Better to have a machine that learns from a large training set Wide variability of same numeral

Chess Game In 1997, Deep Blue(IBM) beat Garry Kasparov(러). – Let IBM’s stock increase by $18 billion at that year

Some Successful Applications ofMachine Learning • Learning to drive an autonomous vehicle – Train computer-controlled vehicles to steer correctly – Drive at 70 mph for 90 miles on public highways – Associate steering commands with image sequence – 1200 computer-generated images as training examples • Half-hour training An additional information from previous image indicating the darkness or lightness of the road

Some Successful Applications ofMachine Learning • Learning to recognize spoken words – Speech recognition/synthesis – Natural language understanding/generation – Machine translation

Example 1: visual object categorization • A classification problem: predict category y based on image x. • Little chance to “hand-craft” a solution, without learning. • Applications: robotics, HCI, web search (a real image Google..)

Face Recognition - 1 Given multiple angles/ views of a person, learn to identify them. Learn to distinguish male from female faces.

Face Recognition - 2 Learn to recongnize emotions, gestures Li, Ye, Kambhametta, 2003

Robot • Sony AIBO robot – Available on June 1, 1999 – Weight: 1.6 KG – Adaptive learning and growth capabilities – Simulate emotion such as happiness and anger

Robot • Honda ASIMO (Advanced Step in Innovate MObility) – Born on 31 October, 2001 – Height: 120 CM, Weight: 52 KG http://blog.makezine.com/archive/2009/08/asimo_avoids_moving_obstacles.html?CMP=OTC-0D6B48984890

Biomedical / Biometrics • Medicine: • Screening • Diagnosis and prognosis • Drug discovery • Security: • Face recognition • Signature / fingerprint • DNA fingerprinting

Computer / Internet • Computer interfaces: • Troubleshooting wizards • Handwriting and speech • Brain waves • Internet • Spam filtering • Text categorization • Text translation • Recommendation 7

Classification • Assign object/event to one of a given finite set of categories. • Medical diagnosis • Credit card applications or transactions • Fraud detection in e-commerce • Worm detection in network packets • Spam filtering in email • Recommended articles in a newspaper • Recommended books, movies, music, or jokes • Financial investments • DNA sequences • Spoken words • Handwritten letters • Astronomical images

Problem Solving / Planning / Control • Performing actions in an environment in order to achieve a goal. • Solving calculus problems • Playing checkers, chess, or backgammon • Driving a car or a jeep • Flying a plane, helicopter, or rocket • Controlling an elevator • Controlling a character in a video game • Controlling a mobile robot

Market Analysis training examples Ecology OCR HWR 105 Machine Vision 104 Text Categorization 103 System diagnosis 102 Bioinformatics 10 inputs 10 102 103 104 105 Applications

Disciplines Related with Machine Learning • Artificial intelligence • 기호 표현 학습, 탐색문제, 문제해결, 기존지식의 활용 • Bayesian methods • 가설 확률계산의 기초, naïve Bayes classifier, unobserved 변수 값 추정 • Computational complexity theory • 계산 효율, 학습 데이터의 크기, 오류의 수 등의 측정에 필요한 이론적 기반 • Control theory • 이미 정의된 목적을 최적화하는 제어과정과 다음 상태 예측을 학습

Disciplines Related with Machine Learning (2) • Information theory • Entropy와 Information Content를 측정, Minimum Description Length, Optimal Code 와 Optimal Training의 관계 • Philosophy • Occam’s Razor, 일반화의 타당성 분석 • Psychology and neurobiology • Neural network models • Statistics • 가설의 정확도 추정시 발생하는 에러의 특성화, 신뢰구간, 통계적 검증

Definition of learning • Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E Task, T Experience, E Task Learned Program Learning Program Program Performance Performance, P

Example: checkers Task T: Playing checkers. Performance measure P: % of games won. Training experience E: Practice games by playing against itself.

Example: Recognizing handwritten letters Task T: Recognizing and classifying handwritten words within images. Performance measure P: % words correctly classified. Training experience E: A database of handwritten words withgiven classifications.

Example: Robot driving Task T: Driving on public four-lane highway using vision sensors. Performance measure P: Average distance traveled before an error (as judged by a human overseer). Training experience E: A sequence of images and steering commands recorded whileobserving a human driver.

Designing a learning system Task T: Playing checkers. Performance measure P: % of games won. Training experience E: Practice games by playing against itself. – What does this mean? – and what can we learn from it?

Measuring Performance • Classification Accuracy • Solution correctness • Solution quality (length, efficiency) • Speed of performance

Designing a Learning System 1. Choose the training experience 2. Choose exactly what is to be learned, i.e. the target function. 3. Choose how to represent the target function. 4. Choose a learning algorithm to infer the target function from the experience. Train examples… Learner Environment/ Experience Knowledge Performance Element Test examples…

Designing a Learning System 1. Choosing the Training Experience • Key Attributes • Direct/indirect feedback 을 제공하는가 ? • Direct feedback: checkers states and correct move • Indirect feedback: move sequences and final outcomes of various games • Credit assignment problem • Degree of controlling the sequence of training example • Learner의 자율성, 학습 정보를 얻을 때 teacher의 도움을 받는 정도, • Distribution of examples • Train examples의 분포와 Test examples의 분포의 문제 • 시스템의 성능을 평가하는 테스트의 예제 분포를 잘 반영해야 함 • 특수한 경우의 분포가 반영 (The Checkers World Champion이 격은 특수한 경우의 분포가 반영 될까 ?)

Training Experience • Direct experience: Given sample input and output pairs for a useful target function. • Checker boards labeled with the correct move, • e.g. extracted from record of expert play • Indirect experience: Given feedback which is not direct I/O pairs for a useful target function. • Potentially arbitrary sequences of game moves and their final game results. • Credit/Blame Assignment Problem: How to assign credit/ blame to individual moves given only indirect feedback?

Source of Training Data • Provided random examples outside of the learner’s control. (학습자-알고리즘판단 밖의 사례를 무작위 추출) • Negative examples available or only positive? • Good training examples selected by a “benevolent teacher.” (Teacher 로 부터) • Learner can construct an arbitrary example and query an oracle for its label. (시스템 자체에서 예제를 구축) • Learner can design and runexperiments directly in the environment without any human guidance. (사람의 도움없이 시스템이 새로운 문제틀을 제시함) • Learner can query an oracle about class of an unlabeled example in the environment. (불안전한예제의 답을 질문, 즉 (x, ___) 입력은 있으나 결과값이 없는 경우)

Training vs. Test Distribution • Generally assume that the training and test examples are independently drawn from the same overall distribution of data. • IID: Independently and identically distributed • If examples are not independent, requires collective classification. (e.g. communication network, financial transaction network, social network에 대한 관계 모델구축 시) • If test distribution is different, requires transfer learning. that is, achieving cumulative learning

Data Mining & Machine Learning Introduction