Problems with Learning

Concept spaces are very large Training sets represent a very small percentage of instances Generalization is not (in general) truth preserving The same training set may allow for different generalizations Heuristics may be necessary to guide search and to constraint the space Problems with Learning

Inductive bias is a way to constrain choice. This could include: Heuristic constraints on the search space Heuristics to guide search Bias towards simplicity Syntactic constraints on the representation of learned concepts Inductive Bias

Conjunctive biases: Only allow conjuncts Limitations on the number of disjuncts Feature vectors: Specify the allowed features and the range of values Decision Trees Horn clauses Representational Biases

Goals: Restrict the set of target concepts so that we can search the space efficiently and still find high quality concepts. High quality is indicative of the effectiveness in classifying objects. Efficiency and correctness may depend not just upon the learning algorithm but also upon the language for expressing concepts, which in turn, denotes the search space. Theory of Learnability

Given 1000 balls of various types, the concept of 'ball' would probably be learnable. Given 1000 random objects, it would be difficult to find an appropriate generalization. This difference is independent of the learning algorithm. Example

A class of concepts is PAC learnable if there is an algorithm that executes efficiently and has a high probability of finding an approximately correct concept. Let C be a set of concepts and X be a set of instances, n = |X|. C is PAC learnable if for a concept error probability ɛ and a failure probability δ, there is an algorithm which trained on X produces a concept c of C, such that the probability that c has a generalization error > ɛ is less than δ. PAC Learnability (Valiant)

That is, for y drawn from the same distribution of samples in X were drawn from: P [ P [y is misclassified by c] > ɛ] ≤δ. The running time for the algorithm must by polynomial in terms of n = |X|, 1/ ɛ, and 1/ δ. PAC Learnability (cont'd)

Some learning algorithms use prior domain knowledge. This is not unusual as people are believed to learn more efficiently if they can relate new knowledge to old. In Explanation-Based Learning, a domain theory is to explain an example. Generalization is then based on the explanation rather then the example itself. Prior Knowledge

There are four components: A target concept – this is the goal A training example (positive) A domain theory – a set of rules and facts that explain how the training example is an example of the target Operationality criteria – restriction on the form of the concepts developed (inductive bias) Explanation-Based Learning

target concept: premise(X) -> cup(X) where premise is a conjunctive expression containing X. domain theory: liftable(X) ^ hold_liquid(X) -> cup(X) part(Z,W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z) light(Y) ^ part(Y,handle) -> liftable(Y) small(A) -> light(A) made_of(A, feathers) -> light(A) EBL Example

training example: cup(obj1), small(obj1), part(obj1, handle), owns(bob, obj1), part(obj1, bottom), part(obj1, bowl), points_up(bowl), concave(bowl), color(obj1, red) operationality criteria: target concepts must be defined in terms of observable, structural properties of objects. Example (cont'd)

Explanation

Generalization

Ignores irrelevant information Generalizations are relevant because they are consistent with the domain theory Can learn from a single training example Allows one to hypothesize unstated relationships between its goals and its experience Advantages of EBL

Can only learn rules that are within the deductive closure of its domain theory Such rules could be deduced without the need of training examples EBL can be seen as a way to speed-up learning However, no need for complete domain theory Limitations of EBL

If two situations are similar in certain respects, we can construct mapping from one to the other and then use that mapping to reason from the first to the second situation Must be able to identify key features in both, ignore extraneous features Selection of the source situation is critical Reasoning by Analogy

Necessary steps: Retrieve potential source case Elaboration: Derive additional features and relationships in the source case Mapping: Map the source attributes to the target Justification: Determine that the mapping is valid Learning: Apply what you know from the source case to the target. Store knowledge for the future. Analogy (cont'd)

Case-based reasoning: Law, Medicine Mathematical theorem proving Physical models Games Diagnoses Uses of Analogy

The system forms and evaluates concepts on its own. Automated discovery Conceptual clustering Unsupervised Learning

AM (Automated Mathematician) was a system of automatically generating “interesting” concepts in mathematics, primarily number theory. The system began with a set of basic concepts (such as a bag, or multi-set) and operators, and then used generalization, specialization, and inversion of operators to define new concepts. AM could generate instances of the concepts and test them. A frequently-occurring concept is deemed interesting AM (Lenat)

Heuristics were used to guide the search. Concepts were represented as small LISP code were which could be mutated. The compact representation was a key to the power of the program to discover new concepts. AM (cont'd)

Numbers Even Odd Factors Primes Goldbach's Conjecture Fundamental Theorem of Arithmetic AM Discoveries

The clustering problem is to take a collection of objects and group them together in a meaningful way. There is some measurable standard of quality which is used to maximize similarity of objects in the same group (cluster). Conceptual Clustering

A simple clustering algorithm is: Choose the pair of objects with the highest degree of similarity. Make them a cluster. Define the features of a cluster as the average of the features of the members. Replace the members by the cluster. Repeat until a single cluster is formed. Clustering Algorithm

Often there is a measure of closeness between objects, or a list of features that can be compared. Weights may be different for different features. Traditional clustering algorithms don't produce meaningful semantic explanations. Clusters are represented extensionally (listing their members) and not intensionally (by providing criteria for membership). Clustering (cont'd)

Select k seeds from the set. For each seed, use that seed as a positive example, and the other seeds as negative example and produce a maximally general definition Classify all the non-seed objects using the definitions produced by the seeds to categorize all objects. Find a specific description for each category. CLUSTER/2

Adjust for overlapping definitions Using a distance metric, select an element closest to the center of each category Repeat steps 1-5 using these new elements as seeds. Stop when satisfactory. If no improvement after several iterations try seeds near the edges of the clusters. CLUSTER/2 (cont'd)

The idea is to interact with the environment and gain feedback (possibly both positive and negative) to adjust behavior.There is a trade-off between what you know and what you gain by further exploration. policy reward value mapping model Reinforcement Learning

Problems with Learning

Problems with Learning

Presentation Transcript

Problems with Urination

Characteristics of Students with Behavior and Learning Problems

Problems dealing with

Problems with Credit

Problems with Credit

Problems with Polypharmacy

problems with democracy

Problems with Powers

Problems with Grace

Learning Problems

Problems with Modifiers

Families with Problems

Problems with Papers

Problems with RSA

PROBLEMS WITH TRAVEL

Problems with Coal

Problems with Z2008262

Stochastic Optimization with Learning For Complex Problems

PROBLEMS WITH

Problems With Industry

Statistical Learning Problems