Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

Collaborators Charles Kemp Noah Goodman Chris Baker Tom Griffiths Amy Perfors Vikash Mansinghka Lauren Schmidt Pat Shafto

Everyday inductive leaps How can people learn so much about the world from such limited evidence? • Learning concepts from examples “horse” “horse” “horse”

“tufa” “tufa” “tufa” Learning concepts from examples

Everyday inductive leaps How can people learn so much about the world from such limited evidence? • Kinds of objects and their properties • The meanings of words, phrases, and sentences • Cause-effect relations • The beliefs, goals and plans of other people • Social structures, conventions, and rules

The solution Prior knowledge (inductive bias).

The solution Prior knowledge (inductive bias). • How does background knowledge guide learning from sparsely observed data? • What form does background knowledge take, across different domains and tasks? • How is background knowledge itself acquired? The challenge: Can we answer these questions in precise computational terms?

Modeling goals • Principled quantitative models of human inductive inferences, with broad coverage and a minimum of free parameters and ad hoc assumptions. • An understanding of how and why human learning and reasoning works, as a species of rational (approximately optimal) statistical inference given the structure of natural environments. • A two-way bridge to artificial intelligence and machine learning.

Bayesian inference • Bayes’ rule: • An example • Data: John is coughing • Some hypotheses: • John has a cold • John has lung cancer • John has a stomach flu • Likelihood P(d|h) favors 1 and 2 over 3 • Prior probability P(h) favors 1 and 3 over 2 • Posterior probability P(h|d) favors 1 over 2 and 3

The Bayesian modeling toolkit • How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

“Similarity”, “Typicality”, “Diversity” A case study: learning about objects and their properties “Property induction”, “category-based induction” (Rips, 1975; Osherson, Smith et al., 1990) Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Flies have T9 hormones. Gorillas have T9 hormones. Seals have T9 hormones. Squirrels have T9 hormones. Horses have T9 hormones. Gorillas have T9 hormones. Chimps have T9 hormones. Monkeys have T9 hormones. Baboons have T9 hormones. Horses have T9 hormones.

Experiments on property induction(Osherson, Smith, Wilkie, Lopez, Shafir, 1990) • 20 subjects rated the strength of 45 arguments: X1 have property P. (e.g., Cows have T4 hormones.) X2 have property P. X3 have property P. All mammals have property P. [General argument] • 20 subjects rated the strength of 36 arguments: X1 have property P. X2 have property P. Horses have property P. [Specific argument]

Property induction as acomputational problem ? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ? ? ? ? ? ? ? New property Features 85 features for 50 animals (Osherson & Wilkie feature rating task). e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Similarity-based models Data Model X1 have property P. X2 have property P. X3 have property P. All mammals have property P. . Each “ ” represents one argument:

Beyond similarity in induction Poodles can bite through wire. German shepherds can bite through wire. • Reasoning based on dimensional thresholds:(Smith et al., 1993) • Reasoning based on causal relations:(Medin et al., 2004; Coley & Shafto, 2003) Dobermans can bite through wire. German shepherds can bite through wire. Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Salmon carry E. Spirus bacteria.

The Bayesian modeling toolkit • How does background knowledge guide learning from sparsely observed data? Bayesian inference: 2. What form does background knowledge take, across different domains and tasks? Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories. 3. How is background knowledge itself acquired? Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.

mouse P(form) squirrel chimp gorilla P(structure | form) P(data | structure) Model overview F: form Tree with species at leaf nodes S: structure F1 F2 F3 F4 Has T9 hormones mouse squirrel chimp gorilla ? ? ? D: data …

mouse squirrel chimp gorilla Model overview F: form Tree with species at leaf nodes S: structure F1 F2 F3 F4 Has T9 hormones mouse squirrel chimp gorilla ? ? ? D: data …

Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ? ? ? ? ? ? ? ... ... Prior P(h)

Horses have T9 hormones Rhinos have T9 hormones Cows have T9 hormones } X Y Hypotheses h Prediction P(Y | X) Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ? ? ? ? ? ? ? ? ... ... Prior P(h)

Where does the prior come from? Horse Cow Chimp Gorilla Mouse Squirrel Dolphin Seal Rhino Elephant ... ... Prior P(h) Why not just enumerate all logically possible hypotheses along with their relative prior probabilities?

Chimps have T9 hormones. Gorillas have T9 hormones. Taxonomic similarity Poodles can bite through wire. Dobermans can bite through wire. Jaw strength Salmon carry E. Spirus bacteria. Grizzly bears carry E. Spirus bacteria. Food web relations Knowledge-based priors

mouse squirrel chimp gorilla Model overview F: form Tree with species at leaf nodes S: structure F1 F2 F3 F4 Has T9 hormones mouse squirrel chimp gorilla ? ? ? D: data …

P(D|S): How the structure constrains the data of experience • Define a stochastic process over structure S that generates candidate property extensions h. • Intuition: properties should vary smoothly over structure. Smooth: P(h) high Not smooth: P(h) low

P(D|S): How the structure constrains the data of experience S dij = length of the edge between i and j (= if i and j are not connected) y A Gaussian prior ~ N(0, S), with (Zhu, Lafferty & Ghahramani, 2003) h

Structure S Data D Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Features 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Modeling feature covariance based on distance in graph (Zhu et al., 2003; c.f. Sattath & Tversky, 1977)

Modeling feature covariance based on distance in two-dimensional space (Lawrence, 2004; Smola & Kondor 2003; c.f. Shepard, 1987)

Structure S Data D Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 ? ? ? ? ? ? ? ? Features New property 85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…

Cows have property P. Elephants have property P. Horses have property P. Tree 2D Gorillas have property P. Mice have property P. Seals have property P. All mammals have property P.

Testing different priors Inductive bias Correct bias Wrong bias Too weak bias x Too strong bias

Spatially varying properties Geographic inference task: “Given that a certain kind of native American artifact has been found in sites near city X, how likely is the same artifact to be found near city Y?” 2D Tree

Property type “has T9 hormones” “can bite through wire” “carry E. Spirus bacteria” Theory Structure taxonomic tree directed chain directed network + diffusion process + drift process + noisy transmission Class A Class B Class C Class D Class E Class F Class G Class D Class D Class A Class A Class F Class E Class C Class C Class B Class G Class E Class B Class F Hypotheses Class G Class A Class B Class C Class D Class E Class F Class G . . . . . . . . .

Biological property Disease property Tree Web Sand shark Mako shark Human Herring Tuna Kelp Dolphin “Given that A has property P, how likely is it that B does?” e.g., P = “has X cells” Herring Tuna Mako shark Sand shark Dolphin e.g., P = “has X disease” Human Kelp

Summary so far • A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations • Qualitatively different priors are appropriate for different domains of property induction. • In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors. • A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph. • Remaining question: How can we learn appropriate structures for different domains?

chimp mouse gorilla squirrel squirrel chimp gorilla mouse Model overview F: form Chain Tree Space mouse squirrel S: structure gorilla chimp F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms Snake Turtle Bat Crocodile Robin Orangutan Ostrich Ostrich Robin Crocodile Snake Turtle Bat Orangutan

Snake Turtle Crocodile Robin Ostrich Bat Orangutan Discovering structural forms “Great chain of being” Snake Turtle Bat Crocodile Robin Plant Rock Angel Orangutan Ostrich God Linnaeus Ostrich Robin Crocodile Snake Turtle Bat Orangutan

People can discover structural forms Tree structure for biological species • Periodic structure for chemical elements • Scientific discoveries • Children’s cognitive development • Hierarchical structure of category labels • Clique structure of social groups • Cyclical structure of seasons or days of the week • Transitive structure for value “great chain of being” Systema Naturae Kingdom Animalia Phylum Chordata Class Mammalia Order Primates Family Hominidae Genus Homo Species Homo sapiens (1837) (1735) (1579)

Typical structure learning algorithms assume a fixed structural form Flat Clusters Line Circle K-Means Mixture models Competitive learning Guttman scaling Ideal point models Circumplex models Grid Tree Euclidean Space Hierarchical clustering Bayesian phylogenetics Self-Organizing Map Generative topographic mapping MDS PCA Factor Analysis

The ultimate goal “Universal Structure Learner” K-Means Hierarchical clustering Factor Analysis Guttman scaling Circumplex models Self-Organizing maps ··· Data Representation

A “universal grammar” for structural forms Form Process Form Process

Node-replacement graph grammars Production (Line) Derivation

chimp mouse gorilla squirrel squirrel chimp gorilla mouse Linear Tree Grid F: form x Favors simplicity squirrel mouse S: structure chimp gorilla Favors smoothness [Zhu et al., 2003] F1 F2 F3 F4 D: data mouse squirrel chimp gorilla

Learning algorithm • Evaluate each form in parallel • For each form, heuristic search over structures based on greedy growth from a one-node seed:

features animals cases judges

objects similarities objects

Structural forms from relational data Dominance hierarchy Tree Cliques Ring Primate troop Bush administration Prison inmates Kula islands “x beats y” “x told y” “x likes y” “x trades with y”

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT

Bayesian models of inductive learning and reasoning Josh Tenenbaum MIT

Presentation Transcript

Bayesian models of human learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Comp

Bayesian models of inductive learning

Inductive Reasoning

Bayesian models of inductive learning

Inductive Reasoning

Bayesian Probabilistic reasoning and learning

Bayesian models of inductive learning

Bayesian models of inductive learning

Inductive Reasoning

Inductive Reasoning

Learning, development and plasticity Josh Tenenbaum MIT Department of Brain and Cognitive Sciences

Learning causal theories Josh Tenenbaum MIT Department of Brain and Cognitive Sciences

Bayesian models of human inductive learning Josh Tenenbaum MIT

Bayesian models of human learning and inference Josh Tenenbaum MIT

Bayesian models of inductive generalization in language acquisition Josh Tenenbaum MIT

Inductive Reasoning

Inductive Reasoning

Building and evaluating models of human-level intelligence Josh Tenenbaum MIT

Bayesian models of inductive learning

Bayesian models of inductive learning

Bayesian models of human inductive learning

Bayesian models of inductive learning