Analysis II: Bayesian Analysis of Individual Data

Response bias, type and token frequency, and prosodic context in segment identification Noah Silbert & Kenneth de Jong Linguistics & Cognitive Science • Frequency Effects in Language • Frequency effects are observed in low-level linguistic domains: • Non-words with more frequent phonotactic patterns are judged more word-like (Frisch, Large, & Pisoni, JML, 2000). • Lexical decision and naming are faster for high frequency than for low frequency words (Forster & Chambers, JVLVB, 1973). • Target words are inhibited more by high frequency neighbors than by low frequency neighbors (Luce & Pisoni, E&H, 1998). • Reduction occurs earlier and to a greater extent in high frequency words and phrases (Bybee, SSLA, 2002). • Modeling Frequency Effects • Exemplar models readily account for frequency effects. • Exemplar models of perception typically make use of similarity and response bias parameters. • Perceptual similarity in linguistic behavior has been investigated many times, but response bias has been (relatively) neglected. • We hypothesize that response bias will be correlated with segment frequency, word frequency, or both. We test this by analyzing rank order correlations between bias and frequency. • Frequency Data • Segment (type) frequency (s) - Hoosier Mental Lexicon • ‘raw’ frequency = the number of occurrences of a given segment in the entire HML (lx) or among monosyllables (mo) • onset (on) [coda (co)] frequency = the number of occurrences of a given segment before [after] the first [last] vowel across words in the entire HML (lx) or among monosyllables (mo) • Word (token) frequency (w) - Brown Corpus • - summed log frequency for each occurrence of a segment for each of the segment frequency measures • Rank Order Correlations Between Frequency Measures • Word and segment measures are not often strongly correlated. • We focus on raw segment and prosodically conditioned monosyllabic segment and word frequency measures. • Analysis II: Bayesian Analysis of Individual Data • Bayesian analysis allows for principled 'null' results. • 1) Transform the correlation coefficients to limit their range to [0,1]. • 2) Analyze the transformed correlation coefficients as beta r.v.s • 3) Apply Bayes' rule and estimate the posterior distribution on A, B • 4) Plot p(r|A,B) for maximally probable A & B parameters • Conclusions: Individual Data • Onset b and raw s frequency are not correlated for either L1 group. • English L1: onset b is positively correlated with onset w • Dutch L1: onset b is weakly correlated with raw s, onset s and w • Coda b is weakly correlated with all three frequency measures, but the correlation is slightly stronger for coda s for English L1. • Discussion • The specification of alternative prior distributions on A & B did not change the results substantially. • Bayesian parameter estimation provides for the possibility of a principled 'null' result where standard inferential statistics does not. • The Cutler, et al., data was gathered with other questions in mind; replication with more data is called for. • Neither differences in accuracy nor differences in the distribution of frequencies account for the apparent difference between prosodic contexts in the individual subject data. • Acknowledgements • This work was supported by NSF grant #BCS-04406540. ASA 154 • Bias Data • Group Data • Miller & Nicely (JASA, 1955) • data pooled across SNR within filtering conditions: • broadband (B), high-pass (H), low-pass (L) • nonsense syllables, white noise • /p t k f θ s ʃ b d g v ð z ʒ m n/ • Luce (Dissertation, 1986) • monosyllabic, 3-phoneme words • data pooled across SNR within each of two prosodic contexts: • onset (ON) and coda (CO) • ON: /p t k f θ s ʃ ʧ h b d g v ð z ʤ n m l r w j/ • CO: /p t k f θ s ʃ ʧ b d g v ð z ʒ ʤ n m l r ŋ/ • de Jong, Silbert, & Park (under review) • nonsense words, no noise, Korean L1 listeners • data pooled within each of four prosodic contexts: • onset (ON), prestress (PRE), poststress (PO), coda (CO) • /p t f θ s b d v ð z l r w j/ • Individual Subject Data • Cutler, Weber, Smits, & Cooper (JASA, 2004) • English and Dutch L1 listeners • English nonsense words, six-talker babble • data pooled across SNR for each subject, within each of two prosodic contexts: • onset (ON) and coda (CO) • ON: /p t k f θ s ʃ ʧ h b d g v ð z ʤ m n l r w j/ • CO: /p t k f θ s ʃ ʧ b d g v ð z ʒ ʤ m n l r ŋ/ • Analysis I: Bias Parameters • Similarity Choice Model (Luce, HBMP, 1963): • Results I: Group Data • Conclusions: Group Data • Bias correlates more strongly with • raw s frequency than with prosodically conditioned s or w frequency • prosodically conditioned s than w frequency. • However, the possible sources of these differences are numerous and conflated.

Analysis II: Bayesian Analysis of Individual Data

Analysis II: Bayesian Analysis of Individual Data

Presentation Transcript

Statistics and Data Analysis

Econometric Analysis of Panel Data

Plotting and Analyzing Networks in Stata

Job Analysis

XRD analysis

Discrete Multivariate Analysis

Spatial Data Analysis

fMRI Data Analysis

Chapter 8

Business Analysis ITEC-630 Fall 2010

Econometric Analysis of Panel Data

Econometric Analysis of Panel Data

RTI Data Analysis

Microarray Data Analysis Using BASE

Motion Analysis Summer Course

Time Series Analysis

What is the problem? Broad Data and Infrastructure Analysis

DATA ANALYSIS

5. Analysis

Grade 6 Supporting Idea 6: Data Analysis

Data Mining: Concepts and Techniques Cluster Analysis Li Xiong