Index of content:
Volume 113, Issue 2, February 2003
- SPEECH PRODUCTION 
113(2003); http://dx.doi.org/10.1121/1.1534100View Description Hide Description
The tissue mechanics governing vocal-fold closure and collision during phonation are modeled in order to evaluate the role of elastic forces in glottal closure and in the development of stresses that may be a risk factor for pathology development. The model is a nonlinear dynamic contact problem that incorporates a three-dimensional, linear elastic, finite-element representation of a single vocal fold, a rigid midline surface, and quasistatic air pressure boundary conditions. Qualitative behavior of the model agrees with observations of glottal closure during normal voice production. The predicted relationship between subglottal pressure and peak collision force agrees with published experimental measurements. Accurate predictions of tissue dynamics during collision suggest that elastic forces play an important role during glottal closure and are an important determinant of aerodynamic variables that are associated with voice quality. Model predictions of contact force between the vocal folds are directly proportional to compressive stress vertical shear stress and Von Mises stress in the tissue. These results guide the interpretation of experimental measurements by relating them to a quantity that is important in tissue damage.
Effects of disfluencies, predictability, and utterance position on word form variation in English conversation113(2003); http://dx.doi.org/10.1121/1.1534836View Description Hide Description
Function words, especially frequently occurring ones such as (the, that, and, and of ), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., ði, ðæt, ænd, ʌv) or a more reduced or lenited pronunciation (e.g., ðə, ðīt, n, ə). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.
113(2003); http://dx.doi.org/10.1121/1.1536169View Description Hide Description
Five commonly used methods for determining the onset of voicing of syllable-initial stop consonants were compared. The speech and glottal activity of 16 native speakers of Cantonese with normal voice quality were investigated during the production of consonant vowel (CV) syllables in Cantonese. Syllables consisted of the initial consonants /p/, /t/, and /k/ followed by the vowel /a/. All syllables had a high level tone, and were all real words in Cantonese. Measurements of voicing onset were made based on the onset of periodicity in the acoustic waveform, and on spectrographic measures of the onset of a voicing bar the onset of the first formant (F1), second formant (F2), and third formant (F3). These measurements were then compared against the onset of glottal opening as determined by electroglottography. Both accuracy and variability of each measure were calculated. Results suggest that the presence of aspiration in a syllable decreased the accuracy and increased the variability of spectrogram-based measurements, but did not strongly affect measurements made from the acoustic waveform. Overall, the acoustic waveform provided the most accurate estimate of voicing onset; measurements made from the amplitude waveform were also the least variable of the five measures. These results can be explained as a consequence of differences in spectral tilt of the voicing source in breathy versus modal phonation.
Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training113(2003); http://dx.doi.org/10.1121/1.1531176View Description Hide Description
Training American listeners to perceive Mandarin tones has been shown to be effective, with trainees’ identification improving by 21%. Improvement also generalized to new stimuli and new talkers, and was retained when tested six months after training [Y. Wang et al., J. Acoust. Soc. Am. 106, 3649–3658 (1999)]. The present study investigates whether the tonecontrasts gained perceptually transferred to production. Before their perception pretest and after their post-test, the trainees were recorded producing a list of Mandarin words. Their productions were first judged by native Mandarin listeners in an identification task. Identification of trainees’ post-test tone productions improved by 18% relative to their pretest productions, indicating significant tone production improvement after perceptual training. Acoustic analyses of the pre- and post-training productions further reveal the nature of the improvement, showing that post-training tone contours approximate native norms to a greater degree than pretraining tone contours. Furthermore, pitch height and pitch contour are not mastered in parallel, with the former being more resistant to improvement than the latter. These results are discussed in terms of the relationship between non-native tone perception and production as well as learning at the suprasegmental level.