Subphonemic detail is used in spoken word recognition: Temporal Integration at Two Time Scales

Subphonemic detail is used in spoken word recognition: Temporal Integration at Two Time Scales Bob McMurray

Grateful Thanks to: Advisors Dick Aslin Mike Tanenhaus Collaborators Meghan Clayards David Gow Saviors in the Lab Julie Markant Dana Subik Committee Joyce McDonough David Knill Christopher Brown People who put up with me Kate Pirog Kathy Corser Bette Andrea Lathrop Jennifer Gillis McCormick

Meaningful stimuli are almost always temporal. Scene Perception: build stable representation across multiple eye-movements, attention shifts. Music: series of notes. Temporal properties (order and rhythm) are fundamental.

Language as Temporal Integration Temporal Integration fundamental to language, as it appears in the world. • Word: Ordered series of articulations. • Sentence: Sequence of words. • A Language: Series of utterances. • Phonology, syntax extracted from this series of utterances.

How are abstract representations formed? • Stimuli do not change arbitrarily. • At any point in time, subtle, perceptual cues tell the system something about the change itself. • Enable an active integration process. • Anticipating future events • Retain partial present representations. • Resolve prior ambiguity.

? But: Early evidence suggested that this perceptual information is not maintained. • Word recognition is an ideal arena: • Substantial perceptual information available. • Multiple timescales for integration.

Overview • Continuous perceptual variation affects word recognition. 2) A new framework for word recognition. 3) Integrating speech cues in online recognition. 4) Long-term temporal integration: development. 5) The use of continuous detail during development. 6) Conclusions

Speech and Word Recognition • Speech Perception • Categorization of acoustic input into sublexical units. Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Identification of target word from active sublexical units. Lexicon

X basic bakery bakery X ba… kery barrier X X bait barricade X baby • Word Recognition as temporal ambiguity resolution • Information arrives sequentially • At early points in time, signal is temporarily ambiguous. • Later arriving information disambiguates the word.

Current models of spoken word recognition • Immediacy: Hypotheses formed from the earliest moments of input. • Activation Based: Lexical candidates (words) receive activation to the degree they match the input. • Parallel Processing: Multiple items are active in parallel. • Competition: Items compete with each other for recognition.

Input: b... u… tt… e… r time beach butter bump putter dog

These processes have been well defined for a phonemic representation of the input. k A g n I S  n But there may be considerably less ambiguity in the signal if we consider subphonemic information. Example: subphonemic effects of motor processes.

Coarticulation n n ee t c k Any action reflects future actions as it unfolds. Example:Coarticulation Movements of articulators (lips, tongue…) during speech reflectcurrent, futureandpastevents. Yields subtle subphonemic variation in speech that reflects temporal organization. Sensitivity to theseperceptualdetails might yield earlier disambiguation.

These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded. Example:Categorical Perception

Categorical Perception B 100 100 Discrimination % /p/ Discrimination ID (%/pa/) 0 0 B VOT P • Sharp identification of tokens on a continuum. P • Discrimination poor within a phonetic category. Subphonemic variation in VOT is discarded in favor of adiscretesymbol (phoneme).

Evidence against the strong form of Categorical Perception comes from a variety of psychophysical-type tasks: • Discrimination Tasks • Pisoni and Tash (1974) • Pisoni & Lazarus (1974) • Carney, Widin & Viemeister (1977) • Training • Samuel (1977) • Pisoni, Aslin, Perey & Hennessy (1982) • Goodness Ratings • Miller (1997) • Massaro & Cohen (1983)

? Does within-category acoustic detail systematically affect higher level language? Is there a gradient effect of subphonemic detail on lexical activation?

McMurray, Aslin & Tanenhaus (2002) A gradient relationshipwould yield systematic effects of subphonemic information on lexical activation. If this gradiency is useful for temporal integration, it must be preserved over time. Need a design sensitive to bothacoustic detailand detailedtemporal dynamicsof lexical activation.

Acoustic Detail Use a speech continuum—more steps yields a better picture acoustic mapping. KlattWorks:generate synthetic continua from natural speech. • 9-step VOT continua (0-40 ms) • 6 pairs of words. • beach/peach bale/pale bear/pear • bump/pump bomb/palm butter/putter • 6 fillers. • lamp leg lock ladder lip leaf • shark shell shoe ship sheep shirt

Temporal Dynamics How do we tap on-line recognition? With an on-line task:Eye-movements Subjects hear spoken language and manipulate objects in a visual world. Visual world includes set of objects with interesting linguistic properties. abeach, apeachand some unrelated items. Eye-movements to each object are monitored throughout the task. Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

Why use eye-movements and visual world paradigm? • Relatively naturaltask. • Eye-movements generated veryfast(within 200ms of first bit of information). • Eye movementstime-lockedto speech. • Subjectsaren’t awareof eye-movements. • Fixation probability maps ontolexical activation..

Task A moment to view the items

Task Bear Repeat 1080 times

Identification Results 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 High agreement across subjects and items for category boundary. proportion /p/ B VOT (ms) P By subject:17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms

Task 200 ms Trials 1 2 3 4 5 Time Target = Bear Competitor = Pear Unrelated = Lamp, Ship

Task 0.9 VOT=0 Response= VOT=40 Response= 0.8 0.7 0.6 0.5 Fixation proportion 0.4 0.3 0.2 0.1 0 0 400 800 1200 1600 2000 0 400 800 1200 1600 Time (ms) More looks to competitor than unrelated items.

Task target Fixation proportion Fixation proportion time time • Given that • the subject heard bear • clicked on “bear”… How often was the subject looking at the “pear”? Categorical Results Gradient Effect target target competitor competitor competitor competitor

Results 20 ms 25 ms 30 ms 10 ms 15 ms 35 ms 40 ms 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 400 800 1200 1600 0 400 800 1200 1600 2000 Response= Response= VOT VOT 0 ms 5 ms Competitor Fixations Time since word onset (ms) Long-lasting gradient effect: seen throughout the timecourse of processing.

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Area under the curve: Clear effects of VOT B: p=.017* P: p<.001*** Linear Trend B: p=.023* P: p=.002*** Response= Response= Looks to Competitor Fixations Looks to Category Boundary VOT (ms)

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Unambiguous Stimuli Only Clear effects of VOT B: p=.014* P: p=.001*** Linear Trend B: p=.009** P: p=.007** Response= Response= Looks to Competitor Fixations Looks to Category Boundary VOT (ms)

Summary Subphonemic acoustic differences in VOT have gradient effect on lexical activation. • Gradient effect of VOT on looks to the competitor. • Effect holds even for unambiguous stimuli. • Seems to be long-lasting. Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).

The Proposed Framework Sensitivity & Use Word recognition is systematically sensitive to subphonemic acoustic detail. 2) Acoustic detail is represented as gradations in activation across the lexicon. This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration. 4) This has fundamental consequences for development: learning phonological organization.

Lexical Sensitivity Word recognition is systematically sensitive to subphonemic acoustic detail. • McMurray, Tanenhaus and Aslin (2002) • Other phonetic contrasts (exp. 1) • Non minimal-pairs (exp. 2) • During development (exps. 3 & 4)

Lexical Basis 2) Acoustic detail is represented as gradations in activation across the lexicon. Lexicon forms a high dimensional basis vector for acoustic/phonetic space. No unneeded dimensions (features) coded—represents only possible alternatives.

bump pump dump bun bumper bomb 2) Acoustic detail is represented as gradations in activation across the lexicon. Input: b... u… m… p… time

Temporal Integration This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration. • Short term cue integration (exp 1): • Cues to phonetic distinctions are spread out over time. • Lexical activation retains probabilistic representation of input as information accumulates. • Longer term ambiguity resolution (exp 2): • Early, ambiguous material retained until more information arrives.

Development 4) Consequences for development: learning phonological organization. • Learning a language: • Integrating input across many utterances to build long-term representation. • Sensitivity to subphonemic detail (exp 3 & 4). • Allows statistical learning of categories (exp 5).

Experiment 1 ? • Do lexical representations serve as a locus for short-term temporal integration of acoustic cues? • Can we see sensitivity to subphonemic detail in additional phonetic contexts?

Phonetic Context VOT Vowel Length Asynchronous cues to voicing: VOT  Vowel Length Both covary with speaking rate: rate normalization

Phonetic Context VOT Vowel Length VOT Vowel Length Asynchronous cues to voicing: VOT  Vowel Length Both covary with speaking rate: rate normalization

Manner of Articulation Formant Transition Slope (FTSlope): Temporal cue like VOT covaries with vowel length. belt welt

Alternative Models Model 1:Sublexical integration time Sublexical Rep. (phonemes) Sublex. The Lexicon VOT precedes Vowel Length. Online processing: how are these cues integrated? VOT Vowel Length

Model 2:Lexical Integration (proposed framework) time More complete representation… The Lexicon VOT precedes Vowel Length. Online processing: how are these cues integrated? VOT Vowel Length Partial representation retained...

Eye-movements reveal lexical activation… ? Will the temporal pattern of fixations to lexical competitors reveal when acoustic information contacts the lexicon?

9-step F3 onset (place) dune/goon dew/goo deuce/goose 9-step F3 onset (laterality) lake/rake lei/rai lace/race • Fillers • No effect of vowel length • Extend gradiency to new continua 9-step VOT continua (0-40 ms) beach/peach beak/peak bees/peas 9-step formant transition slope bench/wench belt/welt bell/well 2 Vowel Lengths x

Task Same task as McMurray et al (2002) 40 Subjects 1080 Trials

Analysis • Validate methods with identification (mouse click) data. • Extend gradient effects of subphonemic detail to • Multiple dimensions • New phonetic contrasts • Disambiguate integration models by examining when effects are seen.

Results: Stimulus Validation 1) Identification: Expected Results (from literature)

Subphonemic detail is used in spoken word recognition: Temporal Integration at Two Time Scales