210 likes | 331 Vues
This set of slides presents key insights from the CS182 lecture on model merging and grammar induction, delivered by Eva Mok and modified by JGM. It covers various concepts, including learning models from data, the challenges of one-shot learning, and the importance of grammar in understanding language structure. The presentation discusses the naive model approach versus model merging strategies, illustrated through examples like Bailey’s VerbLearn system. Assignments and upcoming tasks related to grammar induction are highlighted, providing students a roadmap for application and learning in cognitive science.
E N D
March 15 2006 CS 182Sections 101 - 102 slides created by Eva Mok (emok@icsi.berkeley.edu) modified by JGM
Announcements • a5 is due Friday night at 11:59pm • a6 is out tomorrow (2nd coding assignment), due the Monday after spring break • Midterm solution will be posted (soon)
Quick Recap • This Week • you just had the midterm • a bit more motor control • some belief net, feature structure • Coming up • Bailey’s Model of learning hand action words
Your Task: As far as the brain / thought / language is concerned, what is the single biggest mystery to you at this point?
Remember Recruitment Learning? • One-shot learning • The idea is for things like words or grammar, kids learn at least something given a single input • Granted, they might not get it completely right in the first shot • But over time, their knowledge slowly converges to the right answer (i.e. built a model to fit the data)
Model Merging • Goal: • learn a model given data • The model should: • explain the data well • be "simple" • be able to make generalizations
Naïve way to make a model • create a special case for each piece of data • of course get the training data completely right • cannot generalize at all when test data comes • how to fix this — Model Merging • "compact" the special cases into more descriptive rules without losing too much performance
Basic idea of Model Merging • Start with the naïve model: one special case for each piece of data • While performance increases • Create a more general rule that explains some of the data • Discard the corresponding special cases
2 examples of Model Merging • Bailey’s VerbLearn system • model that maps actions to verb labels • performance: complexity of model + ability to explain data MAP • Assignment 6 - Grammar Induction • model that maps sentences to grammar rules • performance: size of grammar + derivation length of sentences cost
Grammar • Grammar: rules that governs what sentences are legal in a language • e.g. Regular Grammar, Context Free Grammar • Production rules in a grammar have the form • Terminal symbols: a, b, c, etc • Non-terminal symbols: S, A, B, X, etc • Different classes of grammar restrict where these symbols can go • We’ll see an example on the next page
Right-Regular Grammar • Right-Regular Grammar is a further restricted class of Regular Grammar • Non terminal symbols are always on the right end • e.g: S -> a b c X X -> d e X -> f • valid sentences would be "abcde" and "abcf“
Grammar Induction • As input data (e.g. “abcde”, “abcf”) comes in, we’d like to build up a grammar that explains the data • We can certainly have one rule for each sentence we see in the data naive approach, no generalization • Would rather “compact” your grammar • In a6, you have two ways of doing this “compaction” • prefix merge • suffix merge
prefix merge Sa b c d e Sa b c f becomes Sa b c X X d e X f suffix merge S a b c d e S f c d e becomes S a b X S f X X c d e How do we find the model?
Contrived Example • Suppose you have these 3 grammar rules: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there • 5 merging options • prefix merge (r1, r2, 1) • prefix merge (r1, r2, 2) • suffix merge (r1, r3, 1) • suffix merge (r1, r3, 2) • suffix merge (r1, r3, 3)
Computationally • Kids aren’t presented all the data at once • Instead they’ll hear these sentences one by one: • eat them here or there • eat them anywhere • like them anywhere or here or there • As each sentence (i.e. data) comes in, you create one rule for it, e.g. S eat them here or there • Then you look for ways to merge as more sentences come in
Example 1: just prefix merge • After the first two sentences are presented, we can already do a prefix merge of length 2: r1: S eat them here or there r2: S eat them anywhere r3: S eat them X1 r4: X1 here or there r5: X1 anywhere
Example 2: just suffix merge • After the first three sentences are presented, we can do a suffix merge of length 3: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there r4: S eat them X2 r5: S like them anywhere or X2 r6: X2 here or there
Your Task in a6 • pull in sentences one by one • monitor your sentences • do either a prefix merge or a suffix merge as soon as it’s “good” to do so
How do we know if a model is good? • want a small grammar • but want it to explain the data well • minimize the cost along the way: c(G) = s(G) + d(G,D) size of grammar derivation length of sentences : learning factor to play with
Back to Example 2 • Remember your data is: • eat them here or there • eat them anywhere • like them anywhere or here or there • Your original grammar: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there size of grammar = 15 derivation length of sentences = 1 + 1 + 1 = 3 c(G) = s(G) + d(G,D) = ∙ 15 + 3
Back to Example 2 • Remember your data is: • eat them here or there • eat them anywhere • like them anywhere or here or there • Your new grammar: r2: S eat them anywhere r4: S eat them X2 r5: S like them anywhere or X2 r6: X2 here or there so in fact you SHOULDN’T merge if ≤ 2 size of grammar = 14 derivation length of sentences = 2 + 1 + 2 = 5 c(G) = s(G) + d(G,D) = ∙ 14 + 5