GRS LX 865 Topics in Linguistics

GRS LX 865Topics in Linguistics Week 6. Optimality Theory and acquisition.

Optimality Theory • Grammar involves constraints on the representations (e.g., SS, LF, PF, or perhaps a combined representation). • The constraints exist in all languages. • Where languages differ is in how important each constraint is with respect to each other constraint.

Optimality Theory • In our analysis, one constraint is Parse-T, which says that tense must be realized in a clause. A structure without tense (where TP has been omitted, say) will violate this constraint. • Another constraint is *F (“Don’t have a functional category”). A structure with TP will violate this constraint.

Optimality Theory • Parse-T and *F are in conflict—it is impossible to satisfy both at the same time. • When constraints conflict, the choice made (on a language-particular basis) of which constraint is considered to be “more important” (more highly ranked) determines which constraint is satisfied and which must be violated.

Optimality Theory • So if *F >> Parse-T, TP will be omitted. • and if Parse-T >> *F, TP will be included.

Optimality Theory—big picture • Universal Grammar is the constraints that languages must obey. • Languages differ only in how those constraints are ranked relative to one another. (So, “parameter” = “ranking”) • The kid’s job is to re-rank constraints until they match the order which generated the input that s/he hears.

French kid data • This means if a kid uses 3sg or present tense, we can’t tell if they are really using 3sg (they might be) or if they are not using agreement at all and just pronouncing the default. • So, we looked at non-present tense forms and non-3sg forms only to avoid the question of the defaults.

The idea • Kids are subject to conflicting constraints: • Parse-T Include a projection for tense • Parse-Agr Include a project for agreement • *F Don’t complicate your tree with functional projections • *F2 Don’t complicate your tree so much as to have two functional projections.

The idea • Sometimes Parse-T beats out *F, and then there’s a TP. Or Parse-Agr beats out *F, and then there’s an AgrP. Or both Parse-T and Parse-Agr beat out *F2, and so there’s both a TP and an AgrP. • But what does sometimes mean?

Floating constraints • The innovation in Legendre et al. (2000) that gets us off the ground is the idea that as kids re-rank constraints, the position of the constraint in the hierarchy can get somewhat fuzzy, such that two positions can overlap. *F Parse-T

Floating constraints *F Parse-T • When the kid evaluates a form in the constraint system, the position of Parse-T is fixed somewhere in the range—and winds up sometimes outranking, and sometimes outranked by, *F.

Floating constraints *F Parse-T • (Under certain assumptions) this predicts that we would see TP in the structure 50% of the time, and see structures without TP the other 50% of the time.

French kid data • Looked at 3 French kids from CHILDES • Broke development into stages based on a modified MLU-type measure based on how long most of their utterances were (2 words, more than 2 words) and how many of the utterances contain verbs. • Looked at tense and agreement in each of the three stages represented in the data.

French kid data • Kids start out using 3sg agreement and present tense for practically everything (correct or not). • We took this to be a “default” • (No agreement? Pronounce it as 3sg. No tense? pronounce it as present. Neither? Pronounce it as an infinitive.).

French kid data • This means if a kid uses 3sg or present tense, we can’t tell if they are really using 3sg (they might be) or if they are not using agreement at all and just pronouncing the default. • So, we looked at non-present tense forms and non-3sg forms only to avoid the question of the defaults.

French kids data • We found that tense and agreement develop differently—specifically, in the first stage we looked at, kids were using tense fine, but then in the next stage, they got worse as the agreement improved. • Middle stage: looks likecompetition between Tand Agr for a single node.

A detail about counting • We counted non-3sg and non-present verbs. • In order to see how close kids’ utterances were to adult’s utterances, we need to know how often adults use non-3sg and non-present, and then see how close the kids are to matching that level. • So, adults use non-present tense around 31% of the time—so when a kid uses 31% non-present tense, we take that to be “100% success” • In the last stage we looked at, kids were basically right at the “100% success” level for both tense and agreement.

Proportion of non-present and non-3sg verbs

Proportion of non-finite root forms

A model to predict the percentages • Stage 3b (first stage) • no agreement • about 1/3 NRFs, 2/3 tensed forms *F2 *FParseTParseA

A model to predict the percentages • Stage 4b (second stage) • non-3sg agreement and non-present tense each about 15% (=about 40% agreeing, 50% tensed) • about 20% NRFs *F2 *FParseT ParseA

A model to predict the percentages • Stage 4c (third stage) • everything appears to have tense and agreement (adult-like levels) *F2 *FParseT ParseA

Predicted vs. observed—tense

Predicted vs. observed—agr’t

Predicted vs. observed—NRFs

Various things (homework) • Is the OT model just proposed a structure-building or full competence model? • How does the OT model fit in the overall big picture with the ATOM model?

Various things (homework) • For French, we assumed that NRFs appear when both TP and AgrP are missing. Yet, Schütze & Wexler 1996 claimed the root infinitives appeared with either TP or AgrP were missing. • Which one is it?

French v. English • English: T+Agr is pronounced like • /s/ if we have features [3, sg, present] • /ed/ if we have the feature [past] • /Ø/ otherwise • French: T+Agr is pronounced like: • danser NRF • a dansé (3sg) past • je danse 1sg (present) • j’ai dansé 1sg past

         

What we’re doing • The driver who my neighbor who I trust suggested took me to the airport. • The driver who my neighbor who my boss trusts suggested took me to the airport. • Overarching hypothesis: Sentence difficulty has to do with holding onto several unsatisfied dependencies. Longer ones are harder to hold. • Question: What measures length? • Hypothesis: New referents.

How do we see if that’s right? • Center-embedded sentences are the most taxing, several started dependencies, center-most element triple-counted. • The driver whomy neighbor whoI trust … • That’s the most sensitive point, seems to be near critical point of processability.

Experimenting • Does it matter whether we have a known referent (I, you) or a new referent (my neighbor)? • To know for sure, we try holding everything constant except the most embedded subject and see if there are differences (which can then be attributed to the only thing that’s different, the properties of the most embedded subject).

Building the items • The driver who my neighbor who I trust suggested took me to the airport. • The driver who my neighbor who John trusts suggested took me to the airport. • The driver who my neighbor who the housekeeper trusts suggested took me to the airport. • The driver who my neighbor who they trust suggested took me to the airport.

Planning the experiment • Each set of four sentences constitutes a token set(a.k.a. item) • Each item are four conditions (1/2 pronoun, name, definite description, 3 pronoun). • Counterbalancing rules: • Each subject will judge no more than one sentence from each token set. • Each subject will judge all conditions and will see equal numbers of sentences from each condition • Every sentence in every token set will be judged by some subject.

Trial lists • We have four conditions, so we need: • Four different “scripts” (versions of the lists) • Some number of fourples of token sets. • E.g., items 1-4, each with conds a-d • Subj W: 1a, 2b, 3c, 4d (script 1) • Subj X: 1b, 2c, 3d, 4a (script 2) • Subj Y: 1c, 2d, 3a, 4b (script 3) • Subj Z: 1d, 2a, 3b, 4c (script 4)

Our experiment • We will have 20 items (picked from the ones you submitted) and 20 fillers. • (Note: That’s on the small side for a real experiment) • Next steps: • Create the lists of test sentences for the four different scripts. • Spec out and pseudocode our experiment • Investigate PsyScript • Run the experiment • Deal with the data

Creating the scripts • Our sentences are made of very predictable components: • The X who/that the Y who/that Z VP1 VP2 VP3 • The only thing that changes across conditions is Z, while the rest changes across token sets. • We can use Excel to build these from their pieces, to avoid unnecessary errors.

Components Subj1 Rel1 Subj2 Rel2 Subj3a Subj3b Subj3c VP3 VP2 VP1 Answer Question Fillers Question Answer Regions… The way I’ve set it up, everything needs to be exactly 8 regions long (even the fillers) Worksheets

Constructed Computes item (token group) and condition based on row number, comes up with a code like I5V2 (fifth token group, version 2). Builds the sentence region by region based on the condition number. Tables Keeps track of what will be on each script. Scripts are divided into “blocks”, and each block has one of each condition and four fillers, randomized. Sort column is 2*block plus a random number (to order the blocks, but randomly within) Worksheets

Script The master script sheet This generates a script based on the columns you put into I1 and J1. (The column refer to the tables sheet, where the item and condition numbers will be found) B and C for script 1 D and E for script 2 F and G for script 3 H and I for script 4 Script a, … script d Actual scripts. Select the part of script sheet that has data (A1:O41) and copy. Go to script a sheet Paste special… and choose Value (so we don’t copy formulas, only results). Delete column B-D (item, cond, row), select rows 2-41, hit sort button, delete column A (sort), and row 1 (labels) Save as tab-delimited text. Worksheets

The scripts are ready • So, we have the data that we’re going to use. • The next thing is to figure out how we’re going to test these. • The goal is to test reading time on each region of the sentence by presenting the sentence region by region.

Thinking through the experiment • What do we want to have happen? • Display some instructions • Do some practice trials • Display “practice is over” message • Do some real trials • Display “thanks!” • The trials: • Show fully obscured sentence, wait for a key • Reveal next word, wait for a key, until done • Ask question, wait for response • Give sound feedback about correctness

PsyScript • To do this, we’ll use PsyScript, an environment for creating psychology experiments on the Mac. • (It’s basically the only freely available software of this type that has promise for working in the future – if PsyScope had not become commercial as E-Prime, we’d be learning that instead).

AppleScript • The underlying machinery behind PsyScript is something called AppleScript. • This has been part of the Mac OS for about the past 10 years, although it is gaining power and popularity recently. • AppleScript is a means by which you can tell other programs what to do. • For example, tell Internet Explorer to go to a particular web page, tell Word to create a new document and type the date, … • Until you have an actual need for this, it doesn’t seem very exciting…

AppleScript • AppleScript is a sophisticated high-level programming language designed to be human readable (and kind of human writable). It’s supposed to look a lot like English. • PsyScript itself is an application that can be bossed around by AppleScript, and has the features that are useful in psycholinguistic experiments, such as timing, drawing, input, data recording functions.

Getting started • To write (and use) AppleScript, we use Script Editor. • Easiest way to do this: Find the end experiment script and double-click on it. tell application “PsyScript” end experiment end tell

Note about PsyScript • PsyScript runs faster from the Script Editor • If you run PsyScript from the Script Editor you have to manually tell it where your script is. • To do this, find the line that says tell fileHelper to setContainer and change the thing in parentheses to what you see when you Command-click on the name of the script in the title bar of the Script Editor Window, bottom to top, each separated by : and not including the actual name of the script. E.g., • setContainer(“Station 5:Desktop Folder:PsyScript:”)

Movingwindow • I wrote a script called movingwindow to do what we’re going to do today. • The stimuli and instructions files are in a folder called “resources” in the same folder as the script is. The names of these files are set at the top of the script, in mine, they are: • Mwstimuli.txt : sentence list as exported from Excel (tab-delimited text, exporting e.g., script a) • Mwpractice.txt : sentence list for the practice items • Mwinstruc.txt : initial instructions • Mwready.txt : post-practice instructions • Mwthanks.txt : end of experiment debriefing. • Results are stored in “results” folder.

Sentence lists • To generate the sentence lists in the right format for movingwindow, go to one of the script a-d pages, do Save As… from Excel, and choose tab-delimited text. • Columns should be code, question, answer, sentence (in eight columns) • The end results will come out in a file that you can load back into Excel (a tab-delimited file): • Columns are: code, region number, time for region, correct answer 1/0, text of region

         

GRS LX 865 Topics in Linguistics