Volume 132, Issue 1, July 2012
Index of content:
- SPEECH PRODUCTION 
132(2012); http://dx.doi.org/10.1121/1.4728170View Description Hide Description
This paper analyzes the interaction between the vocal folds and vocal tract at phonation onset due to the acoustical coupling between both systems. Data collected from a mechanical replica of the vocal folds show that changes in vocal tract length induce fluctuations in the oscillation threshold values of both subglottal pressure and frequency. Frequency jumps and maxima of the threshold pressure occur when the oscillation frequency is slightly above a vocal tractresonance. Both the downstream and upstream vocal tracts may produce those same effects. A simple mathematical model is next proposed, based on a lumped description of tissue mechanics, quasi-steady flow and one-dimensional acoustics. The model shows that the frequency jumps are produced by saddle-node bifurcations between limit cycles forming a classical pattern of a cusp catastrophe. The transition from a low frequency oscillation to a high frequency one may be achieved through two different paths: in case of a large acoustical coupling (narrow vocal tract) or high subglottal pressure, the bifurcations are crossed, which causes a frequency jump with a hysteresis loop. By reducing the acoustical coupling (wide vocal tract) or the subglottal pressure, a path around the bifurcations may be followed with a smooth frequency variation.
132(2012); http://dx.doi.org/10.1121/1.4726048View Description Hide Description
The goal of this study is to investigate coarticulatory resistance and aggressiveness for the jaw in Catalan consonants and vowels and, more specifically, for the alveolopalatal nasal /ɲ/ and for dark /l/ for which there is little or no data on jaw position and coarticulation. Jaw movement data for symmetrical vowel-consonant-vowel sequences with the consonants /p, n, l, s, ∫, ɲ, k/ and the vowels /i, a, u/ were recorded by three Catalan speakers with a midsagittal magnetometer. Data reveal that jaw height is greater for /s, ∫/ than for /p, ɲ/, which is greater than for /n, l, k/ during the consonant, and for /i, u/ than for /a/ during the vowel. Differences in coarticulatory variability among consonants and vowels are inversely related to differences in jaw height, i.e., fricatives and high vowels are most resistant, and /n, l, k/ and the low vowel are least resistant. Moreover, coarticulation resistant phonetic segments exert more prominent effects and, thus, are more aggressive than segments specified for a lower degree of coarticulatory resistance. Data are discussed in the light of the degree of articulatory constraint model of coarticulation.
132(2012); http://dx.doi.org/10.1121/1.4725762View Description Hide Description
Post-low bouncing is a phenomenon whereby after reaching a very low pitch in a low lexical tone, F 0 bounces up and then gradually drops back in the following syllables. This paper reports the results of an acoustic analysis of the phenomenon in two Mandarin Chinese corpora and presents a simple mechanical model that can effectively simulate this bouncing effect. The acoustic analysis shows that most of the F 0 dynamic features profiling the bouncing effect strongly correlate with the amount of F 0 lowering in the preceding low-tone syllable, and that the additional F 0 raising commences at the onset of the first post-low syllable. Using the quantitative Target Approximation model, this bouncing effect was simulated by adding an acceleration adjustment to the initial F 0 state of the first post-low syllable. A highly linear relation between F 0 lowering and estimated acceleration adjustment was found. This relation was then used to effectively simulate the bouncing effect in both the neutral tone and the full tones. The results of the analysis and simulation are consistent with the hypothesis that the bouncing effect is due to a temporary perturbation of the balance between antagonistic forces in the laryngeal control in producing a very low pitch.
132(2012); http://dx.doi.org/10.1121/1.4725963View Description Hide Description
Speech and singing directivity in the horizontal plane was examined using simultaneous multi-channel full-bandwidth recordings to investigate directivity of high-frequency energy, in particular. This method allowed not only for accurate analysis of running speech using the long-term average spectrum, but also for examination of directivity of separate transient phonemes. Several vocal production factors that could affect directivity were examined. Directivity differences were not found between modes of production (speech vs singing) and only slight differences were found between genders and production levels (soft vs normal vs loud), more pronounced in the higher frequencies. Large directivity differences were found between specific voiceless fricatives, with /s,∫/ more directional than /f,θ/ in the 4, 8, 16 kHz octave bands.
132(2012); http://dx.doi.org/10.1121/1.4726017View Description Hide Description
This study investigates long-term features and utterance contours of fundamental frequency (f0) derived from the German Alcohol Language Corpus. The corpus comprises read, spontaneous, and command&controlspeech uttered by 148 speakers of both genders and various age groups when sober and intoxicated. f0 median, f0 range, and f0 contours are analyzed for intoxication and interactions with gender and age. Contours are compared both directly (root mean squared error, statistical correlation, or the Euclidean distance in the spectral space of the contour) and by parameterization of the contour using discrete cosine transform and the first and second moment of the lower contour spectrum. Results partly confirm earlier findings, i.e., f0 average and range are mostly raised with intoxication, and also suggest that the majority of speakers do not follow a general trend, but show idiosyncratic alterations to f0. f0 contours differ significantly with intoxication, but a more detailed analysis could not assign these changes to specific general form changes like decline or curvature. The results suggest that it is not possible to predict intoxication from f0 in a single model across different speakers. Instead a speaker-dependent model to account for the individual speaker behavior is proposed.