320 likes | 486 Vues
Amplitude-based source parameters for measuring voice quality. Christer Gobl and Ailbhe Ní Chasaide Centre for Language Communication Studies University of Dublin, Trinity College, Ireland. Background
E N D
Amplitude-based source parametersfor measuring voice quality Christer Gobl and Ailbhe Ní Chasaide Centre for Language Communication Studies University of Dublin, Trinity College, Ireland
Background • The methodologies we have used in our analytic studies have mainly involved interactive strategies for inverse filtering and source parameterisation. • Parameterisation has been in terms of LF-model matching (Fant et al, 1985), where the parameters are estimated from the modelled waveform. Most of these are defined by timing events in the differentiated glottal flow signal. • This approach offers potentially accurate source estimates, but is difficult and time-consuming. It further requires special phase-linear recordings. • Our methodologies are therefore mainly suited to ‘micro-studies’ of limited amounts of data.
Background • In a new project on Irish prosody (funded by IRCHSS), we hope to explore the prosodic determinants of voice quality, investigating the interaction between ƒ0 and other source features linguistic intonational categories (e.g., pitch accent, focus, boundary tones), and paralinguistic signalling of affect • For this we need a source analysis method more suited to large corpora, and less dependent on recording conditions…
Alternative amplitude-based parameterisation? • We are encouraged by the work of Mokhtari and Campbell (2003), who have developed a fully automatic system, that analyses voice quality in terms of a single source parameter, for a large corpus of spontaneous speech. • This system uses the amplitude-based AQ parameter, or alternatively the ƒ0 normalised NAQ (Alku et al). • AQ and NAQ have been shown to be more robust than the typical time-based measures (Alku et al, 2002). • AQ and NAQ have also been shown to correlate with the tense (pressed) to lax (breathy) voice quality dimension.
Our present questions are: Given that AQ and NAQ are correlated with the tense/lax dimension • Q1: How are the these measures affected by other dimensions of voice quality variation, and by large ƒ0 variations (often found in spontaneous speech)? • Q2: Can the amplitude-based approach be extended to provide a more detailed specification of source variation, in a way that will capture other dimensions of voice quality?
So, in the present study we: Examine, across utterances with large ƒ0 differences, the correlation between AQ and NAQ, and our perceived judgement of vocal tenseness/laxness Propose an extended set of amplitude-based source parameters Compare results for these amplitude-based parameters with their corresponding time-based ones
Our ‘standard’ parameter set • The parameters we typically derive from the modelled LF-waveform are the following: EE – excitation strength RA – dynamic leakage RG – normalised glottal frequency RK – glottal skew OQ – open quotient • The last four are derived from relative timepoints of the glottal waveform
Amplitude-based measures EE – excitation strength
Amplitude-based measures EE – excitation strength • EE is defined by the negative slope of the glottal flow, at the timepoint of maximum waveform discontinuity (te)…
Amplitude-based measures EE – excitation strength • EE is defined by the negative slope of the glottal flow, at the timepoint of maximum waveform discontinuity (te)… • …i.e., the negative amplitude of the differentiated glottal flow • EE is defined by the negative slope of the glottal flow, at the timepoint of maximum waveform discontinuity (te)… • …i.e., the negative amplitude of the differentiated glottal flow • This amplitude is typically the maximum negative amplitude of the differentiated glottal pulse
Amplitude-based measures Td – ‘declination time’ • Fant (1979) defines the glottal declination time as the relationship between the two amplitudes UP and EE:
Amplitude-based measures Td – declination time • Fant (1979) defines the glottal declination time as the relationship between the two amplitudes UP and EE: • UP is the peak glottal flow, i.e. the amplitude of the glottal pulse (disregarding any ‘DC’ amplitude arising from a possible constant glottal leakage)
Amplitude-based measures Td – declination time • Fant (1979) defines the glottal declination time as the relationship between the two amplitudes UP and EE: • UP is the peak glottal flow, i.e. the amplitude of the glottal pulse (disregarding any ‘DC’ amplitude arising from a possible constant glottal leakage)
Amplitude-based measures Td – declination time • Td can be interpreted by fitting a tangent to the glottal flow pulse at the timepoint of the excitation, te. • Td is determined by the time interval from where this tangent intersects with a horizontal line at the peak flow of the glottal pulse to where it intersects with the zero-flow line. • Td can be interpreted by fitting a tangent to the glottal flow pulse at the timepoint of the excitation, te. Td
Amplitude-based measures Rd – declination time Td normalised to the glottal period • There is also an ƒ0 normalised version of declination time, Rd. This parameter should in principle capture the inherent covarition of the glottal pulse shape and ƒ0 (Fant et al, 1994). • Rd includes a scaling factor so that the numerical value of Rd equals Td in milliseconds when ƒ0 = 110 Hz.
Amplitude-based measures AQ – Amplitude quotient • The AQ was parameter presented by Alku and Vilkman (1996) • fac is the amplitude of the glottal flow pulse, i.e. equivalent to UP. • dpeak is the maximum negative amplitude of the differentiated glottal flow: i.e. typically the same as the amplitude of the excitation, as defined by EE • Thus AQ is in essentially the same as Td
Amplitude-based measures NAQ – Normalised amplitude quotient • The NAQ parameter presented in Alku et al (2002) is AQ normalised to the fundamental period • NAQ would therefore be similar to Rd, except for the scaling factor
Estimating EE in the present study EE – excitation amplitude • For the data analysed in this study, we have consistently taken the maximum negative amplitude of the differentiated glottal flow pulse as the excitation amplitude • Our measurements are thus consistent with the definitions of AQ and NAQ.
Utterances analysed • Utterances analysed were taken from theVOQUAL’03 Speech database 2(JST/CREST Expressive Speech Processing project, ATR, Japan) • Japanese spontaneous speech produced by a female speaker • The utterances were selected on the basis of having a distinctly tense or lax voice quality
Utterances analysed File 8860 [h ù v v i] (L1) File 180 [h ù r i g t oùù k in oùs ù] (L2) (L3) (T1)(T2)(T3) File 286 [] (T4) File 6146 [h ] (T5) • A detailed source analysis was carried out on 10 consecutive glottal cycles, extracted around the maximum amplitude of each of the eight vowels. • Note that we have not attempted to rank order the tokens in term of degree of tenseness/laxness, and so the numbering is not of importance
Results I: NAQ, normalised amplitude quotient • It is clear that NAQ does not separate out the tense versus lax auditory quality of these tokens. • Note that L1 has a lower NAQ value than several of the tense tokens, and that the very high-pitched T5 has a higher NAQ than both L1 and L3.
Results I: NAQ vs. AQ? • The non-normalised, AQ, gives better separation of the tense/lax tokens. • The high-pitched T4, T5 have the lowest AQ values, corresponding to our auditory judgement. • It seems that NAQ, in eliminating the ƒ0 factor, is underestimating the tenseness of the high-pitch tokens. • But might AQ be overestimating tenseness…? • Something in between?
Extended set of amplitude-based parameters • Here we propose an extended set of amplitude-based parameters derived from three amplitude measures EE, UP and EI. • These parameters are meant to correspond to the typical time-based ones. • EI is the maximum positive amplitude of the differentiated glottal flow. • EI is easy to measure, but may not be as robust as EE and UP: it may be more affected by poor inverse filtering and source-filter interaction ripple…
Extended set of amplitude-based parameters • The derivations are based on Fant’s 3-parameter glottal model (Fant, 1979) • The opening phase is modelled as a raised inverted cosine with a frequency FG = 1/(2Tp) Opening phase
Extended set of amplitude-based parameters Approximations Duration of the opening phase: Duration of the closing phase:
Extended set of amplitude-based parameters Time-based Amplitude-based Normalised glottal frequency, RG Glottal skew, RK Open quotient, OQ
Results II: time vs. amplitude measures • There is generally a reasonably good correlation between the amplitude-based parameters and the corresponding time-based measures. • RG and RGa are particularly strongly correlated in these data. • Does this suggest that the formula for the opening phase is more accurate than the one for the closing phase…? r = 0.88 • There is generally a reasonably good correlation between the amplitude-based parameters and the corresponding time-based measures. • RG and RGa are particularly strongly correlated in these data. r = 0.54 r = 0.76
Results II: time vs. amplitude measures • Maybe…but… • These recordings were not done under ideal conditions for accurate estimates of timing events: The time-based measures may not be fully reliable due to phase distortion. r = 0.88 • Maybe…but… r = 0.54 r = 0.76
Conclusions I • In this study we have presented voice source data for a small number of utterances with tense or lax voice quality, with ƒ0 values covering a wide range. • Data were compared for the AQ and NAQ amplitude-based source parameters, non-normalised or normalised to the fundamental period respectively. • The results suggest that the non-normalised AQ parameter is more effective in differentiating between the tense and lax tokens, when ƒ0 spans a wide range of values.
Discussion • How do our results compare with those of Alku et al (2002), who analysed breathy, neutral and pressed phonation types, and found a monotonic decrease in NAQ with degree of tension? • No ƒ0 data were presented, but the variation in ƒ0 across the phonation types were probably relatively small (for the individual subjects), given the nature of the recordings. • It may be the case that NAQ captures the relative degree of voice tenseness/laxness, when data are compared across different speakers (e.g., male-female) with intrinsically different ƒ0 ranges (and consequently different AQ ranges)…
Discussion • …as long as the (intra-speaker) ƒ0 variation is within a relatively smaller range. • For very large intra-speaker ƒ0 variations, the correlation between tenseness/laxness and NAQ might no longer hold. • It seems that the relationship between glottal pulse-shape and perceived voice quality depends on ƒ0 in a way that is rather more complex…
Conclusions II • We have derived an extended set of amplitude-based parameters, corresponding to typical time-based parameters, using three glottal amplitudes: EE, UP and EI. • For most of the data analysed, there was a reasonably good correspondence between the amplitude-based parameters and their time-based counterparts. • However, further analysis is required (using phase-linear recordings) in order to test their effectiveness in predicting the glottal waveshape.