Volume 125, Issue 1, January 2009
Index of content:
- SPEECH PRODUCTION 
Talkers alter vowel production in response to real-time formant perturbation even when instructed not to compensate125(2009); http://dx.doi.org/10.1121/1.3035829View Description Hide Description
Talkers show sensitivity to a range of perturbations of auditory feedback (e.g., manipulation of vocal amplitude, fundamental frequency and formant frequency). Here, 50 subjects spoke a monosyllable (“head”), and the formants in their speech were shifted in real time using a custom signal processing system that provided feedback over headphones. First and second formants were altered so that the auditory feedback matched subjects’ production of “had.” Three different instructions were tested: (1) control, in which subjects were naïve about the feedback manipulation, (2) ignore headphones, in which subjects were told that their voice might sound different and to ignore what they heard in the headphones, and (3) avoid compensation, in which subjects were informed in detail about the manipulation and were told not to compensate. Despite explicit instruction to ignore the feedback changes, subjects produced a robust compensation in all conditions. There were no differences in the magnitudes of the first or second formant changes between groups. In general, subjects altered their vowelformant values in a direction opposite to the perturbation, as if to cancel its effects. These results suggest that compensation in the face of formant perturbation is relatively automatic, and the response is not easily modified by conscious strategy.
125(2009); http://dx.doi.org/10.1121/1.3021436View Description Hide Description
The behavior of glottal flow can, to a large extent, be characterized by development and separation of the boundary layer. The point of flow separation is known to vary during the phonatory cycle due to change in channel configuration. To take the movable nature of the separation point into account, the boundary-layer equation is solved numerically, and the values of the characteristic quantities are determined as well as the point of separation. Development of the boundary layer in general reduces the effective size of the channel, and, therefore, increases the core flow velocity, which, in turn provides the boundary condition of the boundary-layer equation. The interaction between the viscous (boundary layer) and inviscid (core flow) parts of the glottal flow is, therefore, strongly indicated. To apply this viscous-inviscid interaction, the expression of the core flow is derived for a two-dimensional flow field, and is solved jointly with the boundary-layer equation. Numerical results are shown to examine the effect of the Reynolds number and glottal configuration, with special emphasis on the comparison of flow models developed for one- and two-dimensional flow fields. Numerical results are also quantitatively compared with data obtained from flow measurement experiments.
125(2009); http://dx.doi.org/10.1121/1.3037222View Description Hide Description
This paper reports the development of a quantitative target approximation (qTA) model for generating contours of speech. The qTA model simulates the production of tone and intonation as a process of syllable-synchronized sequential target approximation [Xu, Y. (2005). “Speech melody as articulatorily implemented communicative functions,” Speech Commun.46, 220–251]. It adopts a set of biomechanical and linguistic assumptions about the mechanisms of speech production. The communicative functions directly modeled are lexical tone in Mandarin and lexical stress in English and focus in both languages. The qTA model is evaluated by extracting function-specific model parameters from natural speech via supervised learning (automatic analysis by synthesis) and comparing the contours generated with the extracted parameters to those of natural utterances through numerical evaluation and perceptual testing. The contours generated by the qTA model with the learned parameters were very close to the natural contours in terms of root mean square error, rate of human identification of tone, and focus and judgment of naturalness by human listeners. The results demonstrate that the qTA model is both an effective tool for research on tone and intonation and a potentially effective system for automatic synthesis of tone and intonation.
125(2009); http://dx.doi.org/10.1121/1.3021306View Description Hide Description
When a vowel follows an obstruent, the fundamental frequency in the first few tens of milliseconds of the vowel is known to be influenced by the voicing characteristics of the consonant. This influence was re-examined in the study reported here. Stops, fricatives, and the nasal /m/ were paired with the vowels /i,ɑ/ to form CVm syllables. Target syllables were embedded in carrier sentences, and intonation was varied to produce each syllable in either a high, low, or neutral pitch environment. In a high-pitch environment, following voiceless obstruents is significantly increased relative to the baseline /m/, but following voiced obstruents it closely traces the baseline. In a low-pitch environment, is very slightly increased following all obstruents, voiced and unvoiced. It is suggested that for certain pitch environments a conflict can occur between gestures corresponding to the segmental feature [stiff vocal folds] and intonational elements. The results are different acoustic manifestations of [stiff] in different pitch environments. The spreading of the vocal folds that occurs during unvoiced stops in certain contexts in English is an enhancing gesture, which aids the resolution of the gestural conflict by allowing the defining segmental gesture to be weakened without losing perceptual salience.