1 / 9

Analyses on IFA corpus

Analyses on IFA corpus. Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC). Project meeting INTAS 915 May 23-25, 2003, Jyvaskyla. overview. structure of IFA corpus See three reports R. v Son and papers in open lit.

sef
Télécharger la présentation

Analyses on IFA corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS 915 May 23-25, 2003, Jyvaskyla

  2. overview • structure of IFA corpus • See three reports R. v Son and papers in open lit. • why corrected means? • how corrected means • some results • conclusions INTAS 915, Jyvaskyla

  3. structure of IFA corpus • 4 male & 4 female; 5 hrs. of speech; 8 styles a. o. informal story telling (I); retelling (R); reading a story (T), reading sentences (S) • ~50 K words (AIFC, 44 kHz, 16 bit) • label files with annotation tiers • phonemic segmentation and labeling (automatically generated, hand corrected; ~200k boundaries; 0.84 word labels/min; 3.3 boundaries/min) • description levels: phoneme, demi-syllable, syllable, word, sentence, paragraph • tiers: POS, lemma, lexical freq., etc. INTAS 915, Jyvaskyla

  4. access of IFA corpus • use of CGN protocols • non-speech data in database structure • relational DB, SQL query language • basic structure = table items (indiv. phoneme occurrences) x attributes (phoneme parent word, duration, position, speaker, etc.) • WWW front end to simplify access (automatically generating SQL queries; direct links to relevant files) INTAS 915, Jyvaskyla

  5. why corrected means? • non-ideal design (no fixed numbers of observations of all relevant factors; this precludes the use of e.g. ANOVA) • confounding (occurrence of factor values is correlated, thus many combinations of values are rare) • interaction (one factor being modulated by other factors): additive, multiplicative, or ordinal interaction • factors of interest vs. nuisance factors INTAS 915, Jyvaskyla

  6. how corrected means? • incidence matrix from basic data • rows = combinations of levels on factors of interest columns = comb. of levels on nuisance factors • quasi-minimal pairs method • mean difference per row pair: by comparing (non-empty) pairs of columns • matrix of differences (fitted with additive model) • variable sample sizes: use weighting factors • corrected means INTAS 915, Jyvaskyla

  7. example: vowel duration (ms) speaking style (I, R, S, T) vs. lexical stress (+, -) common means corrected means 38061 total counts 13323 row differences

  8. row difference counts + signif. * 0.001 significance INTAS 915, Jyvaskyla

  9. conclusions • simple averaging of unbalanced data is dangerous • free conversational speech data are always unbalanced • the corrected means method then is a good alternative • can be interpreted as a least RMS-error approximation of ‘balanced’ means with an unbalanced data set INTAS 915, Jyvaskyla

More Related