Volume 122, Issue 4, October 2007
Index of content:
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
122(2007); http://dx.doi.org/10.1121/1.2772228View Description Hide Description
This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowelsounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter.
122(2007); http://dx.doi.org/10.1121/1.2773955View Description Hide Description
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by for durations. The responses to the feedback perturbations had mean latencies of approximately and magnitudes of approximately . Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both and amplitude of speech production.