Analyses on IFA corpus

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS 915 May 23-25, 2003, Jyvaskyla

overview • structure of IFA corpus • See three reports R. v Son and papers in open lit. • why corrected means? • how corrected means • some results • conclusions INTAS 915, Jyvaskyla

structure of IFA corpus • 4 male & 4 female; 5 hrs. of speech; 8 styles a. o. informal story telling (I); retelling (R); reading a story (T), reading sentences (S) • ~50 K words (AIFC, 44 kHz, 16 bit) • label files with annotation tiers • phonemic segmentation and labeling (automatically generated, hand corrected; ~200k boundaries; 0.84 word labels/min; 3.3 boundaries/min) • description levels: phoneme, demi-syllable, syllable, word, sentence, paragraph • tiers: POS, lemma, lexical freq., etc. INTAS 915, Jyvaskyla

access of IFA corpus • use of CGN protocols • non-speech data in database structure • relational DB, SQL query language • basic structure = table items (indiv. phoneme occurrences) x attributes (phoneme parent word, duration, position, speaker, etc.) • WWW front end to simplify access (automatically generating SQL queries; direct links to relevant files) INTAS 915, Jyvaskyla

why corrected means? • non-ideal design (no fixed numbers of observations of all relevant factors; this precludes the use of e.g. ANOVA) • confounding (occurrence of factor values is correlated, thus many combinations of values are rare) • interaction (one factor being modulated by other factors): additive, multiplicative, or ordinal interaction • factors of interest vs. nuisance factors INTAS 915, Jyvaskyla

how corrected means? • incidence matrix from basic data • rows = combinations of levels on factors of interest columns = comb. of levels on nuisance factors • quasi-minimal pairs method • mean difference per row pair: by comparing (non-empty) pairs of columns • matrix of differences (fitted with additive model) • variable sample sizes: use weighting factors • corrected means INTAS 915, Jyvaskyla

example: vowel duration (ms) speaking style (I, R, S, T) vs. lexical stress (+, -) common means corrected means 38061 total counts 13323 row differences

row difference counts + signif. * 0.001 significance INTAS 915, Jyvaskyla

conclusions • simple averaging of unbalanced data is dangerous • free conversational speech data are always unbalanced • the corrected means method then is a good alternative • can be interpreted as a least RMS-error approximation of ‘balanced’ means with an unbalanced data set INTAS 915, Jyvaskyla

Analyses on IFA corpus

Analyses on IFA corpus

Presentation Transcript

IFA Presentation – Business perspective on CFC Reform

Corpus

IFAC PRESENTATION IFA TAXES ON SUCCESSION

Variance Analyses from Invariance Analyses

IFA SEMINAR

Instructors: Tom Munizzo , IFA Tim McCarthy, IFA

IFA

Analyses

News on ZEUS Leading Baryon analyses

IFA 11 th Global Conference on Ageing

Update on Fracture Analyses

Collaboration on J/  physics analyses

Corpus-Based Analyses of English Evaluative Adjectives

2-D Wavelet analyses on sinograms

Analyses on public investments

Doing Analyses on Binary Outcome

Report on the CHORUS analyses

IFA SEMINAR

IFA

IFA