A Brief Survey of Machine Learning

A Brief Survey of Machine Learning

Example:Various Fonts classification

Example:Handwritten Digits Recognition

ExampleAbstract Images

Training Examples: Class 1 Training Examples: Class 2 Test example: Class = ?

Machine Learning Lectures Outline: what we will discuss?

ML Lectures Outline: what we will discuss? • We know already several methods of machine learning. • What are general principles? • Can we create improved methods? • What are examples of applications of Machine Learning?

ML Lectures Outline: what we will discuss? • Why machine learning? • Brief Tour of Machine Learning • A case study • A taxonomy of learning • Intelligent systems engineering: specification of learning problems • Issues in Machine Learning • Design choices • The performance element: intelligent systems • Some Applications of Learning • Database mining, reasoning (inference/decision support), acting • Industrial usage of intelligent systems • Robotics

What is Learning?

What is Learning? definitions • “Learning denotes changes in a system that ... enable a system to do the same task more efficiently the next time.” -- Herbert Simon • “Learning is constructing or modifying representations of what is being experienced.” -- Ryszard Michalski • “Learning is making useful changes in our minds.” -- Marvin Minsky

Why Machine Learning? What ML can do? • Discovers new things or structures that are unknown to humans • Examples: • Data mining, • Knowledge Discovery in Databases • Fills inskeletal or incompletespecifications about a domain • Large, complex AI systems cannot be completely derived by hand • They require dynamic updating to incorporate new information. • Learning new characteristics: • 1. expands the domain or expertise • 2. lessens the "brittleness" of the system • Using learning, the software agents can adapt to: • to their users, • to other software agents, • to the changing environment.

Why Machine Learning? • New Computational Capability • Database mining: • converting (technical) records into knowledge • Self-customizing programs: • learning news filters, • adaptive monitors • Learning to act: • robot planning, • control optimization, • decision support • Applications that are hard to program: • automated driving, • speech recognition

Why Machine Learning? • Better Understanding of Human Learning and Teaching • Understand and improveefficiency of human learning • Use to improve methods for teaching and tutoring people • e.g., better computer-aided instruction. (can our robot-head teach English?) • Cognitive science: theories of knowledge acquisition (e.g., through practice) • Performance elements: reasoning (inference) and recommender systems • Time is Right • Recent progress in algorithms and theory • Rapidly growing volume of online data from various sources • Available computational power • Growth and interest of learning-based industries (e.g., data mining/KDD)

A General Model of Learning Agents

Disciplines relevant to Machine Learning • Artificial Intelligence • Bayesian Methods • Cognitive Science • Computational Complexity Theory • Control Theory • Information Theory • Neuroscience • Philosophy • Psychology • Statistics Optimization Learning Predictors Meta-Learning Entropy Measures MDL Approaches Optimal Codes PAC Formalism Mistake Bounds Language Learning Learning to Reason Machine Learning Bayes’s Theorem Missing Data Estimators Symbolic Representation Planning/Problem Solving Knowledge-Guided Learning Bias/Variance Formalism Confidence Intervals Hypothesis Testing Power Law of Practice Heuristic Learning Occam’s Razor Inductive Generalization ANN Models Modular Learning

Forms of Machine Learning • Supervised • Learns from examples which provide desired outputs for given inputs • Unsupervised • Learns patterns in input data when no specific output values are given • Reinforcement • Learns by an indication of correctness at end of some reasoning So far, we covered only Supervised Learning, but this knowledge will be also useful in Unsupervised and Reinforcement Learning

Supervised Learning • Must have training data including • Inputs (features) to be considered in decision • Outputs (correct decisions) for those inputs • Inductive reasoning • Given a collection of examples of function f, return a function h that approximates f • Difficulty: many functions h may be possible • Hope to pick function h that generalizes well • Tradeoff between the complexity of the hypothesis and the degree of fit to the data • Consider data modeling

Decision Trees, Decision Diagrams, Decompositions

Decision Trees or Decision Diagrams (DAG) • Map features of situation to decision • Example from a classification of unsafe acts: In general, branches of a tree can be combined to create a DAG (Directed Acyclic Graph)

Decision Trees, not only binary attributes and decisions • The actions can be more than only yes/no decisions of a classifier. • Relation to rule-based reasoning • Features of element used to classify element • Features of situation used to select action • Used as the basis for many “how to” books • How to identify type of snake? • Observable features of snake • How to fix an automobile? • Features related to problem and state of automobile • If features are understandable, the decision tree can be used to explain decision

Learning Decision Trees: classification versus regression • Types of Decision Trees • Learning a discrete-valued function is classification learning • Learning a continuous-valued function is regression • Assumption: use of Ockham’s razor will result in more general function • Want the smallest decision tree, but that is not tractable • Will be satisfied with smallish tree

Algorithm for Decision Tree Learning • Basic idea • Recursively select feature that splits data (most) unevenly • No need to use all features • Can be generalized to decomposition methods • Can be generalized to other types of trees, DAGs. • Heuristic approach (applied not only to trees) • Compare features for their ability to meaningfully split data • Feature-value = greatest difference in average output value(s) * size of smaller subset • Avoids splitting out individuals too early

MAIN FORMAT OF DATA for classification

Classification Terms • Data: A set of N vectors x • Features are parameters of x; x lives in feature space • May be • whole, raw images; • parts of images; • filtered images; • statistics of images; • or something else entirely • Labels: C categories; each x belongs to some ci • Classifier: Create formula(s) or rule(s) that will assign unlabeled data to correct category • Equivalent definition is to parametrize a decision surface in feature space separating category members plus the awareness that there’s a lot of data out there that the classifier may yet encounter

Example:Road classification

Features and Labels for Road Classification • Feature vectors: 410 features/point over 32 x 20 grid • Color histogram [Swain & Ballard, 1991] (24 features) • 8 bins per RGB channel over surrounding 31 x 31 camera subimage • Gabor wavelets [Lee, 1996] (384) • Characterize texture with 8-bin histogram of filter responses for • 2 phases, • 3 scales, • 8 angles over 15 x 15 camera subimage • Ground height, smoothness (2) • Mean, variance of laser height values projecting to 31 x 31 camera subimage • Labels derived from inside/outside relationship of feature point to road-delimiting polygon We want to classify for each pixel, does it belong to a road or not.

Features and Labels for Road Classification 0° even 45° even 0° odd from Rasmussen, 2001 So for classification, the term “features” is commonly used, but you can think of these as the road analogs of attributes. Features are computed on a grid every 10 pixels horizontally and vertically the three Gabor filter scales correspond to kernel sizes of 6 x 6, 12 x 12, and 25 x 25

Key Classification Problems • What features to use? How do we extract them from the image? • Do we even have labels (i.e., examples from each category)? • What do we know about the structure of the categories in feature space?

Theoretical Aspects of Learning Systems

Three Aspects of Learning Systems • 1. Models: • decision trees, • linear threshold units (winnow, weighted majority), • neural networks, • Bayesian networks (polytrees, belief networks, influence diagrams, HMMs), • genetic algorithms, • instance-based (nearest-neighbor) • 2. Algorithms (e.g., for decision trees): • ID3, • C4.5, • CART, • OC1 • 3. Methodologies: • supervised, • unsupervised, • reinforcement; • knowledge-guided

What are the aspects of research on Learning? • 1. Theory of Learning • Computational learning theory (COLT): complexity, limitations of learning • Probably Approximately Correct (PAC) learning • Probabilistic, statistical, information theoretic results • 2. Multistrategy Learning: • Combining Techniques, • Knowledge Sources • 3. Create and collect Data: • Time Series, • Very Large Databases (VLDB), • Text Corpora • 4. Select good applications • Performance element: • classification, • decision support, • planning, • control • Database mining and knowledge discovery in databases (KDD) • Computer inference: learning to reason

Some Issues in Machine Learning • What Algorithms Can Approximate Functions Well? When? • How Do Learning System Design Factors Influence Accuracy? • Number of training examples • Complexity of hypothesis representation • How Do Learning Problem Characteristics Influence Accuracy? • Noisy data • Multiple data sources • What Are The Theoretical Limits of Learnability? • How Can Prior Knowledge of Learner Help? • What Clues Can We Get From Biological Learning Systems? • How Can Systems Alter Their Own Representation?

Major Paradigms of Machine Learning

Major Paradigms of Machine Learning • Rote Learning • One-to-one mapping from inputs to stored representation. • "Learning by memorization.” • Association-based storage and retrieval. • Clustering • Analogue • Determine correspondence between two different representations • Induction • Use specific examples to reach general conclusions • Discovery • Unsupervised, specific goal not given • Genetic Algorithms

Major Paradigms of Machine Learning • Neural Networks • Reinforcement • Feedback given at end of a sequence of steps. • Feedback can be positive or negative reward • Assign reward to steps by solving the credit assignment problem — • which steps should receive credit or blame for a final result?

The Inductive Learning Problem

The Inductive Learning Problem • Induce rules that extrapolatefrom a given set of examples • These rules should make “accurate” predictions about future examples. • Supervised versus Unsupervised learning • Learn an unknown function f(X) = Y, where: • X is an input example and • Y is the desired output. • Supervised learning implies we are given a training set of (X, Y) pairs by a "teacher.“ • Unsupervised learning means we are only given the Xs and some (ultimate) feedback function on our performance.

The Inductive Learning Problem • Concept learning • Called also Classification • Given a set of examples of some concept/class/category, determine if a given exampleis an instance of the concept or not. • If it is an instance, we call it a positive example. • If it is not, it is called a negative example.

Inductive Learning Framework • Raw input data from sensors are preprocessed to obtain a feature vector, X, that adequately describes all of the relevant features for classifying examples. • Each x is a list of (attribute, value) pairs. For example, X = [Person:Sue, EyeColor:Brown, Age:Young, Sex:Female] • The number and names of attributes (aka features) is fixed (positive, finite). • Each attribute has a fixed, finite number of possible values. • Each example can be interpreted as a point in an n-dimensionalfeature space, where n is the number of attributes.

ExampleLearning to play Checkers

Specifying A Learning Problem • Learning = Improving with Experience at Some Task • Improve over task T, • with respect to performance measure P, • based on experience E. • Example: Learning to Play Checkers • T: play games of checkers • P: percent of games won in world tournament • E: opportunity to play against self • Refining the Problem Specification: Issues • What experience? • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it? • Defining the Problem Milieu • Performance element: • How shall the results of learning be applied? • How shall the performance element be evaluated? The learning system?

Type of Training Experience • Direct or indirect? • Teacher or not? • Knowledge about the game (e.g., openings/endgames)? • Problem: Is Training Experience Representative (of Performance Goal)? • Software Design • Assumptions of the learning system:legal move generator exists • Software requirements: • generator, • evaluator(s), • parametric target function • Choosing a Target Function • ChooseMove: Board  Move(action selection function, or policy) • V: Board  R(board evaluation function) • Ideal target V; approximated target • Goal of learning process: operational description (approximation) of V Example: Learning to Play Checkers

Possible Definition • If b is a final board state that is won, then V(b) = 100 • If b is a final board state that is lost, then V(b) = -100 • If b is a final board state that is drawn, then V(b) = 0 • If b is not a final board state in the game, then V(b) = V(b’) where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game • Correct values, but not operational • Choosing a Representation for the Target Function • Collection of rules? • Neural network? • Polynomial function (e.g., linear, quadratic combination) of board features? • Other? • A Representation for Learned Function • bp/rp = number of black/red pieces; bk/rk = number of black/red kings; bt/rt = number of black/red pieces threatened (can be taken on next turn) A Target Function forLearning to Play Checkers

A Training Procedure for Learning to Play Checkers • Obtaining Training Examples • the target function • the learned function • the training value • One Rule For Estimating Training Values: • Choose Weight Tuning Rule • Least Mean Square (LMS) weight update rule: REPEAT • Select a training example b at random • Compute the error(b) for this training example • For each board feature fi, update weight wi as follows: where c is a small, constant factor to adjust the learning rate

Determine Type of Training Experience Games against experts Games against self Table of correct moves Determine Target Function Board  move Board  value Determine Representation of Learned Function Polynomial Linear function of six features Artificial neural network Determine Learning Algorithm Linear programming Gradient descent Design Choices forLearning to Play Checkers Completed Design

Supervised Learning

Evaluating Supervised Learning Algorithms

Evaluating Supervised Learning Algorithms • Collect a large set of examples (input/output pairs) • Divide into two disjoint sets • Training data • Testing data • Apply learning algorithm to training data, generating a hypothesis h • Measure % of examples in the testing data that are successfully classified by h (or amount of error for continuously valued outputs) • Repeat above steps for different sizes of training sets and different randomly selected training sets

A Brief Survey of Machine Learning