Download Presentation
## Shiffrin Says

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Bayesian Models of Human Learning and Inference Josh**TenenbaumMITDepartment of Brain and Cognitive Sciences**Shiffrin Says**“Progress in science is driven by new tools, not great insights.”**Outline**• Part I. Brief survey of Bayesian modeling in cognitive science. • Part II. Bayesian models of everyday inductive leaps.**Collaborators**Tom Griffiths Neville Sanjana Charles Kemp Mark Steyvers Tevye Krynski Sean Stromsten Sourabh Niyogi Fei Xu Dave Sobel Wheeler Ruml Alison Gopnik**Collaborators**Tom Griffiths Neville Sanjana Charles Kemp Mark Steyvers Tevye KrynskiSean Stromsten Sourabh Niyogi Fei XuDave Sobel Wheeler Ruml Alison Gopnik**Outline**• Part I. Brief survey of Bayesian modeling in cognitive science. • Rational benchmark for descriptive models of probability judgment. • Rational analysis of cognition • Rational tools for fitting cognitive models**Normative benchmark for descriptive models**• How does human probability judgment compare to the Bayesian ideal? • Peterson & Beach, Edwards, Tversky & Kahneman, . . . . • Explicit probability judgment tasks • Drawing balls from an urn, rolling dice, medical diagnosis, . . . . • Alternative descriptive models • Heuristics and Biases, Support Theory, . . . .**Rational analysis of cognition**• Develop Bayesian models for core aspects of cognition not traditionally thought of in terms of statistical inference. • Examples: • Memory retrieval: Anderson; Shiffrin et al, . . . . • Reasoning with rules: Oaksford & Chater, . . . .**Rational analysis of cognition**• Often can explain a wider range of phenomena than previous models, with fewer free parameters. Spacing effects on retention Power laws of practice and retention**Rational analysis of cognition**• Often can explain a wider range of phenomena than previous models, with fewer free parameters. • Anderson’s rational analysis of memory: • For each item in memory, estimate the probability that it will be useful in the present context. • Model of need probability inspired by library book access. Corresponds to statistics of natural information sources:**Rational analysis of cognition**• Often can explain a wider range of phenomena than previous models, with fewer free parameters. • Anderson’s rational analysis of memory: • For each item in memory, estimate the probability that it will be useful in the present context. • Model of need probability inspired by library book access. Corresponds to statistics of natural information sources: Short lag Long lag Log need odds Log days since last occurrence**Rational analysis of cognition**• Often can show that apparently irrational behavior is actually rational. Which cards do you have to turn over to test this rule? “If there is an A on one side, then there is a 2 on the other side”**Rational analysis of cognition**• Often can show that apparently irrational behavior is actually rational. • Oaksford & Chater’s rational analysis: • Optimal data selection based on maximizing expected information gain. • Test the rule “If p, then q” against the null hypothesis that p and q are independent. • Assuming p and q are rare predicts people’s choices:**Rational tools for fitting cognitive models**• Use Bayesian Occam’s Razor to solve the problem of model selection: trade off fit to the data with model complexity. • Examples: • Comparing alternative cognitive models: Myung, Pitt, . . . . • Fitting nested families of models of mental representation: Lee, Navarro, . . . .**Rational tools for fitting cognitive models**• Comparing alternative cognitive models via an MDL approximation to the Bayesian Occam’s Razor takes into account the functional form of a model as well as the number of free parameters.**Rational tools for fitting cognitive models**• Fit models of mental representation to similarity data, e.g. additive clustering, additive trees, common and distinctive feature models. • Want to choose the complexity of the model (number of features, depth of tree) in a principled way, and search efficiently through the space of nested models. Using Bayesian Occam’s Razor:**Outline**• Part I. Brief survey of Bayesian modeling in cognitive science. • Part II. Bayesian models of everyday inductive leaps. Rational models of cognition where Bayesian model selection, Bayesian Occam’s Razor play central explanatory role.**Everyday inductive leaps**How can we learn so much about . . . • Properties of natural kinds • Meanings of words • Future outcomes of a dynamic process • Hidden causal properties of an object • Causes of a person’s action (beliefs, goals) • Causal laws governing a domain . . . from such limited data?**“tufa”**“tufa” “tufa” Learning concepts and words Can you pick out the tufas?**Cows can get Hick’s disease.**Gorillas can get Hick’s disease. All mammals can get Hick’s disease. Inductive reasoning Input: (premises) (conclusion) Task: Judge how likely conclusion is to be true, given that premises are true.**Inferring causal relations**Input: Took vitamin B23 Headache Day 1 yes no Day 2 yes yes Day 3 no yes Day 4 yes no . . . . . . . . . Does vitamin B23 cause headaches? Task: Judge probability of a causal link given several joint observations.**The Challenge**• How do we generalize successfully from very limited data? • Just one or a few examples • Often only positive examples • Philosophy: • Induction is a “problem”, a “riddle”, a “paradox”, a “scandal”, or a “myth”. • Machine learning and statistics: • Focus on generalization from many examples, both positive and negative.**Likelihood**Prior probability Posterior probability Rational statistical inference(Bayes, Laplace)**History of Bayesian Approaches to Human Inductive Learning**• Hunt**History of Bayesian Approaches to Human Inductive Learning**• Hunt • Suppes • “Observable changes of hypotheses under positive reinforcement”, Science (1965), w/ M. Schlag-Rey. “A tentative interpretation is that, when the set of hypotheses is large, the subject ‘samples’ or attends to several hypotheses simultaneously. . . . It is also conceivable that a subject might sample spontaneously, at any time, or under stimulations other than those planned by the experimenter. A more detailed exploration of these ideas, including a test of Bayesian approaches to information processing, is now being made.”**History of Bayesian Approaches to Human Inductive Learning**• Hunt • Suppes • Shepard • Analysis of one-shot stimulus generalization, to explain the universal exponential law. • Anderson • Rational analysis of categorization.**Theory-Based Bayesian Models**• Explain the success of everyday inductive leaps based on rational statistical inference mechanisms constrained by domain theories well-matched to the structure of the world. • Rational statistical inference (Bayes): • Domain theories generate the necessary ingredients: hypothesis space H, priors p(h).**Questions about theories**• What is a theory? • Working definition: an ontology and a system of abstract (causal) principles that generates a hypothesis space of candidate world structures (e.g., Newton’s laws). • How is a theory used to learn about the structure of the world? • How is a theory acquired? • Probabilistic generative model statistical learning.**Alternative approaches to inductive generalization**• Associative learning • Connectionist networks • Similarity to examples • Toolkit of simple heuristics • Constraint satisfaction**Marr’s Three Levels of Analysis**• Computation: “What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out?” • Representation and algorithm: Cognitive psychology • Implementation: Neurobiology**Descriptive Goals**• Principled mathematical models, with a minimum of arbitrary assumptions. • Close quantitative fits to behavioral data. • Unified models of cognition across domains.**Explanatory Goals**• How do we reliably acquire knowledge about the structure of the world, from such limited experience? • Which processing models work, and why? • New views on classic questions in cognitive science: • Symbols (rules, logic, hierarchies, relations) versus Statistics. • Theory-based inference versus Similarity-based inference. • Domain-specific knowledge versus Domain-general mechanisms. • Provides a route to studying people’s hidden (implicit or unconscious) knowledge about the world.**The plan**• Basic causal learning • Inferring number concepts • Reasoning with biological properties • Acquisition of domain theories • Intuitive biology: Taxonomic structure • Intuitive physics: Causal law**The plan**• Basic causal learning • Inferring number concepts • Reasoning with biological properties • Acquisition of domain theories • Intuitive biology: Taxonomic structure • Intuitive physics: Causal law**Injected**with X Not injected with X Expressed Y 30 45 Did not express Y 30 15 Learning a single causal relation Given a random sample of mice: • “To what extent does chemical X cause gene Y • to be expressed?” • Or, “What is the probability that X causes Y?”**c-**(not injected with X) c+ (injected with X) e+ (expressed Y) c a e- (did not express Y) d b Associative models of causal strength judgment • Delta-P (or Asymptotic Rescorla-Wagner): • Power PC (Cheng, 1997):**DP = 0.75**DP = 0.25 DP = 0.5 DP = 1 DP = 0 Some behavioral data Buehner & Cheng, 1997 People DP Power PC • Independent effects of both causal power and DP. • Neither theory explains the trend for DP=0.**C**B w0 w1 C B E w0 E Bayesian causal inference • Hypotheses: h1 = h0 = w0, w1: strength parameters for B, C**C**B w0 w1 C B E w0 E Bayesian causal inference • Hypotheses: h1 = h0 = • Probabilistic model: “noisy-OR” w0, w1: strength parameters for B, C C B h1: h0: 0 0 1 0 0 1 1 1 0 w1 w0 w1+ w0 – w1 w0 0 0 w0 w0**C**B w0 w1 C B E w0 E Bayesian causal inference B B • Hypotheses: h1 = h0 = • Probabilistic model: “noisy-OR” Background cause B unobserved, always present (B=1) w0, w1: strength parameters for B, C C B h1: h0: 0 0 1 0 0 1 1 1 0 w1 w0 w1+ w0 – w1 w0 0 0 w0 w0**C**B w0 w1 C B E w0 E Inferring structure versus estimating strength B B • Hypotheses: h1 = h0 = • Both causal power and DP correspond to maximum likelihood estimates of the strength parameter w1, under different parameterizations for p(E|B,C): • linear DP, Noisy-OR causal power • Causal support model: people are judging the probability that a causal link exists, rather than assuming it exists and estimating its strength.**Role of domain theory**(c.f. PRMs, ILP, Knowledge-based model construction) Generates hypothesis space of causal graphical models: • Causally relevant attributes of objects: • Constrains random variables (nodes). • Causally relevant relations between attributes: • Constrains dependence structure of variables (arcs). • Causal mechanisms – how effects depend functionally on their causes: • Constrains local probability distribution for each variable conditioned on its direct causes (parents).**Role of domain theory**• Injections may or may not cause gene expression, but gene expression does not cause injections. • No hypotheses with E C • Other naturally occurring processes may also cause gene expression. • All hypotheses include an always-present background cause B C • Causes are probabilistically sufficient and independent (Cheng): Each cause independently produces the effect in some proportion of cases. • “Noisy-OR” causal mechanism**C**B w0 w1 C B E w0 E B B • Hypotheses: h1 = h0 = • Bayesian causal inference: noisy-OR Assume all priors uniform....**C**B C B w0 E E w0 w1 increasing DP Bayesian Occam’s Razor P( data | model ) low w1 high w1 All possible data sets**C**B C B w0 E E w0 w1 Bayesian Occam’s Razor P( data | model ) low w1 high w1**C**B C B w0 E E w0 w1 Bayesian Occam’s Razor P( data | model ) low w1 high w1**DP = 0.75**DP = 0.25 DP = 0.5 DP = 1 DP = 0 Buehner & Cheng, 1997 People DP Power PC Bayes