CS 182 Sections 101 - 102

March 15 2006 CS 182Sections 101 - 102 slides created by Eva Mok (emok@icsi.berkeley.edu) modified by JGM

Announcements • a5 is due Friday night at 11:59pm • a6 is out tomorrow (2nd coding assignment), due the Monday after spring break • Midterm solution will be posted (soon)

Quick Recap • This Week • you just had the midterm • a bit more motor control • some belief net, feature structure • Coming up • Bailey’s Model of learning hand action words

Your Task: As far as the brain / thought / language is concerned, what is the single biggest mystery to you at this point?

Remember Recruitment Learning? • One-shot learning • The idea is for things like words or grammar, kids learn at least something given a single input • Granted, they might not get it completely right in the first shot • But over time, their knowledge slowly converges to the right answer (i.e. built a model to fit the data)

Model Merging • Goal: • learn a model given data • The model should: • explain the data well • be "simple" • be able to make generalizations

Naïve way to make a model • create a special case for each piece of data • of course get the training data completely right • cannot generalize at all when test data comes • how to fix this — Model Merging • "compact" the special cases into more descriptive rules without losing too much performance

Basic idea of Model Merging • Start with the naïve model: one special case for each piece of data • While performance increases • Create a more general rule that explains some of the data • Discard the corresponding special cases

2 examples of Model Merging • Bailey’s VerbLearn system • model that maps actions to verb labels • performance: complexity of model + ability to explain data  MAP • Assignment 6 - Grammar Induction • model that maps sentences to grammar rules • performance: size of grammar + derivation length of sentences  cost

Grammar • Grammar: rules that governs what sentences are legal in a language • e.g. Regular Grammar, Context Free Grammar • Production rules in a grammar have the form    • Terminal symbols: a, b, c, etc • Non-terminal symbols: S, A, B, X, etc • Different classes of grammar restrict where these symbols can go • We’ll see an example on the next page

Right-Regular Grammar • Right-Regular Grammar is a further restricted class of Regular Grammar • Non terminal symbols are always on the right end • e.g: S -> a b c X X -> d e X -> f • valid sentences would be "abcde" and "abcf“

Grammar Induction • As input data (e.g. “abcde”, “abcf”) comes in, we’d like to build up a grammar that explains the data • We can certainly have one rule for each sentence we see in the data  naive approach, no generalization • Would rather “compact” your grammar • In a6, you have two ways of doing this “compaction” • prefix merge • suffix merge

prefix merge Sa b c d e Sa b c f becomes Sa b c X X  d e X  f suffix merge S  a b c d e S  f c d e becomes S  a b X S  f X X c d e How do we find the model?

Contrived Example • Suppose you have these 3 grammar rules: r1: S  eat them here or there r2: S  eat them anywhere r3: S  like them anywhere or here or there • 5 merging options • prefix merge (r1, r2, 1) • prefix merge (r1, r2, 2) • suffix merge (r1, r3, 1) • suffix merge (r1, r3, 2) • suffix merge (r1, r3, 3)

Computationally • Kids aren’t presented all the data at once • Instead they’ll hear these sentences one by one: • eat them here or there • eat them anywhere • like them anywhere or here or there • As each sentence (i.e. data) comes in, you create one rule for it, e.g. S  eat them here or there • Then you look for ways to merge as more sentences come in

Example 1: just prefix merge • After the first two sentences are presented, we can already do a prefix merge of length 2: r1: S  eat them here or there r2: S  eat them anywhere r3: S  eat them X1 r4: X1  here or there r5: X1  anywhere

Example 2: just suffix merge • After the first three sentences are presented, we can do a suffix merge of length 3: r1: S  eat them here or there r2: S  eat them anywhere r3: S  like them anywhere or here or there r4: S  eat them X2 r5: S  like them anywhere or X2 r6: X2  here or there

Your Task in a6 • pull in sentences one by one • monitor your sentences • do either a prefix merge or a suffix merge as soon as it’s “good” to do so

How do we know if a model is good? • want a small grammar • but want it to explain the data well • minimize the cost along the way: c(G) =  s(G) + d(G,D) size of grammar derivation length of sentences : learning factor to play with

Back to Example 2 • Remember your data is: • eat them here or there • eat them anywhere • like them anywhere or here or there • Your original grammar: r1: S  eat them here or there r2: S  eat them anywhere r3: S  like them anywhere or here or there size of grammar = 15 derivation length of sentences = 1 + 1 + 1 = 3 c(G) =  s(G) + d(G,D) =  ∙ 15 + 3

Back to Example 2 • Remember your data is: • eat them here or there • eat them anywhere • like them anywhere or here or there • Your new grammar: r2: S  eat them anywhere r4: S  eat them X2 r5: S  like them anywhere or X2 r6: X2  here or there so in fact you SHOULDN’T merge if  ≤ 2 size of grammar = 14 derivation length of sentences = 2 + 1 + 2 = 5 c(G) =  s(G) + d(G,D) =  ∙ 14 + 5

CS 182 Sections 101 - 102

CS 182 Sections 101 - 102

Presentation Transcript

CS 182 Sections 103 - 104

CS 182 Sections 101 - 104

CS 182 Sections 103 - 104

CS 102

CS 182 Sections 101 - 102

CS 102

CS 102

CS 102

CS 102

CS 182 Sections 103 - 104

CS 102

CS 102

CS 102

CS 102

CS 102

CS 102

CS 102

CS 182 Sections 103 - 104