Index of content:
Volume 125, Issue 4, April 2009
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data125(2009); http://dx.doi.org/10.1121/1.3079772View Description Hide Description
This paper investigates the theoretical basis for estimating vocal-tract length (VTL) from the formant frequencies of vowel sounds. A statistical inference model was developed to characterize the relationship between vowel type and VTL, on the one hand, and formant frequency and vocal cavity size, on the other. The model was applied to two well known developmental studies of formant frequency. The results show that VTL is the major source of variability after vowel type and that the contribution due to other factors like developmental changes in oral-pharyngeal ratio is small relative to the residual measurement noise. The results suggest that speakers adjust the shape of the vocal tract as they grow to maintain a specific pattern of formant frequencies for individual vowels. This formant-pattern hypothesis motivates development of a statistical-inference model for estimating VTL from formant-frequency data. The technique is illustrated using a third developmental study of formant frequencies. The VTLs of the speakers are estimated and used to provide a more accurate description of the complicated relationship between VTL and glottal pulse rate as children mature into adults.
Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongsa)125(2009); http://dx.doi.org/10.1121/1.3081384View Description Hide Description
Non-contemporaneous speech samples from 27 male speakers of Australian English were compared in a forensic likelihood-ratio framework. Parametric curves (polynomials and discrete cosine transforms) were fitted to the formant trajectories of the diphthongs ∕aɪ∕, ∕eɪ∕, ∕oʊ∕, ∕aʊ∕, and ∕ɔɪ∕. The estimated coefficient values from the parametric curves were used as input to a generative multivariate-kernel-density formula for calculating likelihood ratios expressing the probability of obtaining the observed difference between two speech samples under the hypothesis that the samples were produced by the same speaker versus under the hypothesis that they were produced by different speakers. Cross-validated likelihood-ratio results from systems based on different parametric curves were calibrated and evaluated using the log-likelihood-ratio cost function . The cross-validated likelihood ratios from the best-performing system for each vowel phoneme were fused using logistic regression. The resulting fused system had a very low error rate, thus meeting one of the requirements for admissibility in court.