1 / 51

Analyzing cultural evolution by iterated learning

Analyzing cultural evolution by iterated learning. Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley. Learning languages from utterances. blicket toma dax wug blicket wug. S  X Y X  {blicket,dax} Y  {toma, wug}. Learning functions from ( x , y ) pairs.

Télécharger la présentation

Analyzing cultural evolution by iterated learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing cultural evolution by iterated learning Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley

  2. Learning languages from utterances blicket toma dax wug blicket wug S  X Y X  {blicket,dax} Y  {toma, wug} Learning functions from (x,y) pairs Learning categories from instances of their members Inductive problems

  3. Learning data

  4. Iterated learning(Kirby, 2001) What are the consequences of learners learning from other learners?

  5. Outline Part I: Formal analysis of iterated learning Part II: Iterated learning in the lab

  6. Outline Part I: Formal analysis of iterated learning Part II: Iterated learning in the lab

  7. Objects of iterated learning How do constraints on learning (inductive biases) influence cultural universals?

  8. Language • The languages spoken by humans are typically viewed as the result of two factors • individual learning • innate constraints (biological evolution) • This limits the possible explanations for different kinds of linguistic phenomena

  9. Linguistic universals • Human languages possess universal properties • e.g. compositionality (Comrie, 1981; Greenberg, 1963; Hawkins, 1988) • Traditional explanation: • linguistic universals reflect strong innate constraints specific to a system for acquiring language (e.g., Chomsky, 1965)

  10. Cultural evolution • Languages are also subject to change via cultural evolution (through iterated learning) • Alternative explanation: • linguistic universals emerge as the result of the fact that language is learned anew by each generation (using general-purpose learning mechanisms, expressing weak constraints on languages) (e.g., Briscoe, 1998; Kirby, 2001)

  11. Analyzing iterated learning PL(h|d) PL(h|d) PP(d|h) PP(d|h) PL(h|d): probability of inferring hypothesis h from data d PP(d|h): probability of generating data d from hypothesis h

  12. Markov chains • Variables x(t+1) independent of history given x(t) • Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic) x x x x x x x x Transition matrix P(x(t+1)|x(t))

  13. d0 d1 h1 h2 h3 d2 PL(h|d) PP(d|h) PL(h|d) PP(d|h) PL(h|d) A Markov chain on hypotheses h1 h2 h3 d PP(d|h)PL(h|d) d PP(d|h)PL(h|d) A Markov chain on data d0 d1 d2 h PL(h|d) PP(d|h) h PL(h|d) PP(d|h) h PL(h|d) PP(d|h) Analyzing iterated learning

  14. Bayesian inference Reverend Thomas Bayes

  15. Likelihood Prior probability Posterior probability Sum over space of hypotheses Bayes’ theorem h: hypothesis d: data

  16. A note on hypotheses and priors • No commitment to the nature of hypotheses • neural networks (Rumelhart & McClelland, 1986) • discrete parameters (Gibson & Wexler, 1994) • Priors do not necessarily represent innate constraints specific to language acquisition • not innate: can reflect independent sources of data • not specific: general-purpose learning algorithms also have inductive biases expressible as priors

  17. Assume learners sample from their posterior distribution: Iterated Bayesian learning PL(h|d) PL(h|d) PP(d|h) PP(d|h)

  18. Stationary distributions • Markov chain on h converges to the prior, P(h) • Markov chain on d converges to the “prior predictive distribution” (Griffiths & Kalish, 2005)

  19. Explaining convergence to the prior PL(h|d) PL(h|d) • Intuitively: data acts once, prior many times • Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h) PP(d|h) PP(d|h) (Griffiths & Kalish, 2007)

  20. Gibbs sampling For variables x = x1, x2, …, xn Draw xi(t+1) from P(xi|x-i) x-i = x1(t+1),x2(t+1),…, xi-1(t+1), xi+1(t), …, xn(t) Converges to P(x1, x2, …, xn) (Geman & Geman, 1984) (a.k.a. the heat bath algorithm in statistical physics)

  21. Gibbs sampling (MacKay, 2003)

  22. Explaining convergence to the prior PL(h|d) PL(h|d) When target distribution is P(d,h) = PP(d|h)P(h), conditional distributions are PL(h|d) and PP(d|h) PP(d|h) PP(d|h)

  23. Implications for linguistic universals • When learners sample from P(h|d), the distribution over languages converges to the prior • identifies a one-to-one correspondence between inductive biases and linguistic universals

  24. Assume learners sample from their posterior distribution: Iterated Bayesian learning PL(h|d) PL(h|d) PP(d|h) PP(d|h)

  25. r = 1 r =  r = 2 From sampling to maximizing

  26. From sampling to maximizing • General analytic results are hard to obtain • (r =  is Monte Carlo EM with a single sample) • For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h)r • the ordering identified by the prior is preserved, but not the corresponding probabilities (Kirby, Dowman, & Griffiths, 2007)

  27. Implications for linguistic universals • When learners sample from P(h|d), the distribution over languages converges to the prior • identifies a one-to-one correspondence between inductive biases and linguistic universals • As learners move towards maximizing, the influence of the prior is exaggerated • weak biases can produce strong universals • cultural evolution is a viable alternative to traditional explanations for linguistic universals

  28. Infinite populations in continuous time • “Language dynamical equation” • “Neutral model” (fj(x) constant) • Stable equilibrium at first eigenvector of Q, which is our stationary distribution (Nowak, Komarova, & Niyogi, 2001) (Komarova & Nowak, 2003)

  29. Analyzing iterated learning • The outcome of iterated learning is strongly affected by the inductive biases of the learners • hypotheses with high prior probability ultimately appear with high probability in the population • Clarifies the connection between constraints on language learning and linguistic universals… • …and provides formal justification for the idea that culture reflects the structure of the mind

  30. Outline Part I: Formal analysis of iterated learning Part II: Iterated learning in the lab

  31. Learning languages from utterances blicket toma dax wug blicket wug S  X Y X  {blicket,dax} Y  {toma, wug} Learning functions from (x,y) pairs Learning categories from instances of their members Inductive problems

  32. Revealing inductive biases • Many problems in cognitive science can be formulated as problems of induction • learning languages, concepts, and causal relations • Such problems are not solvable without bias (e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995) • What biases guide human inductive inferences? If iterated learning converges to the prior, then it may provide a method for investigating biases

  33. Serial reproduction(Bartlett, 1932)

  34. General strategy • Step 1: use well-studied and simple tasks for which people’s inductive biases are known • function learning • concept learning • Step 2: explore learning problems where effects of inductive biases are controversial • frequency distributions • systems of color terms

  35. data hypotheses Iterated function learning • Each learner sees a set of (x,y) pairs • Makes predictions of y for new x values • Predictions are data for the next learner (Kalish, Griffiths, & Lewandowsky, 2007)

  36. Stimulus Feedback Response Slider Function learning experiments Examine iterated learning with different initial data

  37. Initial data Iteration 1 2 3 4 5 6 7 8 9

  38. Iterated concept learning • Each learner sees examples from a species • Identifies species of four amoebae • Species correspond to boolean concepts hypotheses data (Griffiths, Christian, & Kalish, 2006)

  39. color Types of concepts(Shepard, Hovland, & Jenkins, 1961) size shape Type I Type II Type III Type IV Type V Type VI

  40. Results of iterated learning Human learners Bayesian model Probability Probability Iteration Iteration

  41. Frequency distributions hypotheses data • Each learner sees objects receiving two labels • Produces labels for those objects at test • First learner: one label {0,1,2,3,4,5}/10 times 5 x “DUP” P(“DUP”| ) =  5 x “NEK” (Vouloumanos, 2008) (Reali & Griffiths, submitted)

  42. Results after one generation Frequency of target label Condition

  43. Results after five generations Frequency of target label

  44. The Wright-Fisher model • Basic model: x copies of gene A in population of N • With mutation…

  45. Iterated learning and Wright-Fisher   • Basic model is MAP with uniform prior, mutation model is MAP with Beta prior • Extends to other models of genetic drift… • connection between drift models and inference x x x with Florencia Reali

  46. Cultural evolution of color terms with Mike Dowman and Jing Xu

  47. Identifying inductive biases • Formal analysis suggests that iterated learning provides a way to determine inductive biases • Experiments with human learners support this idea • when stimuli for which biases are well understood are used, those biases are revealed by iterated learning • What do inductive biases look like in other cases? • continuous categories • causal structure • word learning • language learning

  48. data Conclusions • Iterated learning provides a lens for magnifying the inductive biases of learners • small effects for individuals are big effects for groups • When cognition affects culture, studying groups can give us better insight into individuals

  49. Credits Joint work with… Brian Christian Mike Dowman Mike Kalish Simon Kirby Steve Lewandowsky Florencia Reali Jing Xu Computational Cognitive Science Lab http://cocosci.berkeley.edu/

More Related