Bayesian models of human inference

Bayesian models of human inference Josh Tenenbaum MIT

The Bayesian revolution in AI • Principled and effective solutions for inductive inference from ambiguous data: • Vision • Robotics • Machine learning • Expert systems / reasoning • Natural language processing • Standard view in AI: no necessary connection to how the human brain solves these problems. • Heuristics & Biases program in the background (“We know people aren’t Bayesian, but…”).

Bayesian models of cognition Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...] Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …] Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr,…] Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …] Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …] Attention [Mozer, Huber, Torralba, Oliva, Geisler, Yu, Itti, Baldi, …] Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …] Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …] Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …] Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]

How to meet up with mainstream JDM research (i.e., heuristics & biases)? • How to reconcile apparently contradictory messages of H&B and Bayesian models? Are people Bayesian or aren’t they? When are they, when aren’t they, and why? • How to integrate the H&B and Bayesian research approaches?

When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • People are Bayesian in low-level input or output processes that have a long evolutionary history shared with other species, e.g. vision, motor control, memory retrieval.

When are people Bayesian, and why? • Higher-level cognition can be Bayesian when information is presented in formats that we have evolved to process, and that support simple heuristic algorithms, e.g., base-rate neglect disappears with “natural frequencies”. • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) Explicit probabilities Natural frequencies

When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis • Bayes can illuminate distinctively human cognitive capacities for inductive inference – learning words and concepts, projecting properties of objects, causal inference, or action understanding: problems we solve effortlessly, unconsciously, and successfully in natural contexts, which a five-year-old solves better than any animal or computer.

A B E When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis A B A Causal induction B A B B A Trial AB Trial ? ? (Sobel, Griffiths, Tenenbaum, & Gopnik)

When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis Word learning Hypothesis space Data (Tenenbaum & Xu)

When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core capacities hypothesis • Bayes can illuminate distinctively human cognitive capacities for inductive inference – learning words and concepts, projecting properties of objects, causal inference, or action understanding: problems we solve effortlessly, unconsciously, and successfully in natural contexts, which a five-year-old solves better than any animal or computer. • The mind is not good at explicit Bayesian reasoning about verbally or symbolically presented statistics, unless core capacities can be engaged.

When are people Bayesian, and why? • Low level hypothesis (Shiffrin, Maloney, etc.) • Information format hypothesis (Gigerenzer) • Core competence hypothesis Correct Statistical version of Diagnosis problem Causal version of Diagnosis problem Base-rate neglect (Krynski & Tenenbaum)

How to meet up with mainstream JDM research (i.e., heuristics & biases)? • How to reconcile apparently contradictory messages of H&B and Bayesian models? Are people Bayesian or aren’t they? When are they, when aren’t they, and why? • How to integrate the H&B and Bayesian research approaches?

Reverse engineering • Goal is to reverse-engineer human inference. • A computational understanding of how the mind works and why it works it does. • Even for core inferential capacities, we are likely to observe behavior that deviates from any ideal Bayesian analysis. • These deviations are likely to be informative about how the mind works.

Analogy to visual illusions (Adelson) (Shepard) • Highlight the problems the visual system is designed to solve: inferring world structure from images, not judging properties of the images themselves. • Reveal the implicit visual system’s implicit assumptions about the physical world and the processes of image formation that are needed to solve these problems.

How do we interpret deviations from a Bayesian analysis? • H&B: People aren’t Bayesian, but use some other means of inference. • Base-rate neglect: representativeness heuristic • Recency bias: availability heuristic • Order of evidence effects: anchoring and adjustment • … • Not so compelling as reverse engineering. • What engineer would want to design a system based on “representativeness”, without knowing how it is computed, why it is computed that way, what problem it attempts to solve, when it works, or how its accuracy and efficiency compares to some ideal computation or other heuristics.

How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory • What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm • How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation

How do we interpret deviations from a Bayesian analysis? Multiple levels of analysis (Marr) • Computational theory • What is the goal of the computation – the outputs and available inputs? What is the logic by which the inference can be performed? What constraints (prior knowledge) do people assume to make the solution well-posed? • Representation and algorithm • How is the information represented? How is the computation carried out algorithmically, approximating the ideal computational theory with realistic time & space resources? • Hardware implementation Bayes

Different philosophies • H&B • One canonical Bayesian analysis of any given task, and we know what it is. • Ideal Bayesian solution can be computed. • The question “Are people Bayesian?” is empirically meaningful on any given task. • Bayes+Marr • Many possible Bayesian analyses of any given task, and we need to discover which best characterize cognition. • Ideal Bayesian solution can only be approximately computed. • The question “Are people Bayesian?” is not an empirical one, at least not for an individual task. Bayes is a framework-level assumption, like distributed representations in connectionism or condition-action rules in ACT-R.

The centrality of causal inference • In visual perception: • Judge P(scene|image features) rather than P(image features|scene) or P(image features|other image features). • Coin–flipping: Which sequence is more likely to come from flipping a fair coin, HHTHT or HHHHH? • Coincidences: How likely that 2 people in a random party of 25 have the same birthday? 3 in a party of 10? (Griffiths & Tenenbaum)

Judgments of randomness: Judgments of coincidence: Rational measure of evidential support: (Griffiths & Tenenbaum)

Assuming the world is simple • In visual perception: • “Slow and smooth” prior on visual motion • Causal induction: • P(blicket) = 1/6, “Activation law” P(A is a blicket|data) = 1 P(B is a blicket|data) ~ 1/6 P(A is a blicket|data) ~ 3/4 P(B is a blicket|data) ~ 1/4 A B A B A A B B A Trial AB Trial C A B A C B C A B AC Trial AB Trial

Gorillas have T9 cells. Seals have T9 cells. Horses have T9 cells. Recognizing the world is complex • In visual perception: • Need uncertainty about coherence ratio and velocity of coherent motion. (Lu & Yuille) • Property induction: • Properties should be distributed stochastically over tree structure, not just focused on single branches. r = 0.50 Bayes: single branch prior (Kemp & Tenenbaum)

Gorillas have T9 cells. Seals have T9 cells. Horses have T9 cells. Recognizing the world is complex • In visual perception: • Need uncertainty about coherence ratio and velocity of coherent motion. (Lu & Yuille) • Property induction: • Properties should be distributed stochastically over tree structure, not just focused on single branches. r = 0.92 Bayes: “mutation” prior (Kemp & Tenenbaum)

“has T9 hormones” “can bite through wire” “is found near Minneapolis” “carry E. Spirus bacteria” (Kemp & Tenenbaum)

Sampling-based approximate inference • In visual perception: • Temporal dynamics of bi-stability due to fast sampling-based approximation of a bimodal posterior (Schrater & Sundareswara). • Order effects in category learning • Particle filter (sequential Monte Carlo), an online approximate inference algorithm assuming stationarity. • Probability matching in classification decisions • Sampling-based approximations with guarantees of near optimal generalization performance. (Griffiths et al., Goodman et al.)

Conclusions • “Are people Bayesian?”, “When are they Bayesian?” • Maybe not the most interesting questions in the long run…. • What is the best way to reverse engineer cognition at multiple levels of analysis? Assuming core inductive capacities are approximately Bayesian at the computational-theory level offers several benefits: • Explanatory power: why does cognition work? • Fewer degrees of freedom in modeling • A bridge to state-of-the-art AI and machine learning • Tools to study the big questions: What are the goals of cognition? What does the mind know about the world? How is that knowledge represented? What are the processing mechanisms and why do they work as they do?

Coincidences(Griffiths & Tenenbaum, in press) • The birthday problem • How many people do you need to have in the room before the probability exceeds 50% that two of them have the same birthday? • The bombing of London 23.

How much of a coincidence?

Bayesian coincidence factor: Alternative hypotheses: proximity in date, matching days of the month, matching month, .... Chance: Latent common cause: C x x x x x x x x x x August

How much of a coincidence?

Bayesian coincidence factor: Latent common cause: Chance: C x x x x x x x x x x uniform + regularity uniform

Bayesian models of human inference

Bayesian models of human inference

Presentation Transcript

Bayesian models of human inference

Bayesian Inference

Bayesian Inference!!!

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian models of human learning and inference Josh Tenenbaum MIT

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian models of human inductive learning

Bayesian Inference

Bayesian Inference

Bayesian inference