Volume 126, Issue 4, October 2009
Index of content:
- SPEECH PRODUCTION 
Simultaneous measures of electropalatography and intraoral pressure in selected voiceless lingual consonants and consonant sequences of German126(2009); http://dx.doi.org/10.1121/1.3180694View Description Hide Description
This work assessed relationships among intraoral pressure (IOP), electropalatographic (EPG) measures, and consonant sequence duration, in the following obstruents, clusters, and affricates of German: /t/, /ʃ/, /ʃt/, and . The data showed significant correlations between IOP and percentage of articulatory contact (PC) for all speakers, whereas duration and place of articulation (measured by the EPG center of gravity) contributed less to IOP changes. Speakers differed in the strength of this relationship, possibly reflecting differences in vocal tract morphology or degree of laryngeal abduction. Single-point EPG and IOP measures in fricatives showed consistent correspondences across consonantal contexts, but the relationships for the stops were more complex and reflected positional effects. Temporal compression was observed for both members of the cluster, but only the fricative portion of the affricate. Conversely, coarticulation was observed for both the stop and fricative portion of the affricate, but only for the stop portion of the cluster, possibly reflecting biomechanical constraints. No clear differences were observed in coarticulatory resistance for stops and fricatives. These data contribute to a limited literature on articulatory-aerodynamic relationships in voiceless consonants and consonant sequences, and will provide a baseline for considering longer combinations of obstruents.
126(2009); http://dx.doi.org/10.1121/1.3183592View Description Hide Description
The purpose of this study was to identify, using computational models, the vocal fold parameters which are most influential in determining the vibratory characteristics of the vocal folds. The sensitivities of vocal folds modal frequencies to variations model parameters were used to determine the most influential parameters. A detailed finite element model of the human vocal fold was created. The model was defined by eight geometric and six material parameters. The model included transitional boundary regions to idealize the complex physiological structure of real human subjects. Parameters were simultaneously varied over ranges representative of actual human vocal folds. Three separate statistical analysis techniques were used to identify the most and least sensitive model parameters with respect to modal frequency. The results from all three methods consistently suggest that a set of five parameters are most influential in determining the vibratory characteristics of the vocal folds.
126(2009); http://dx.doi.org/10.1121/1.3184581View Description Hide Description
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc.74, 829–836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formantanalysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented.
A biomechanical model of cardinal vowel production: Muscle activations and the impact of gravity on tongue positioning126(2009); http://dx.doi.org/10.1121/1.3204306View Description Hide Description
A three-dimensional (3D) biomechanicalmodel of the tongue and the oral cavity, controlled by a functional model of muscle force generation (-model of the equilibrium point hypothesis) and coupled with an acoustic model, was exploited to study the activation of the tongue and mouth floor muscles during the production of French cardinal vowels. The selection of the motor commands to control the tongue and the mouth floor muscles was based on literature data, such as electromyographic, electropalatographic, and cineradiographic data. The tongue shapes were also compared to data obtained from the speaker used to build the model. 3D modeling offered the opportunity to investigate the role of the transversalis, in particular, its involvement in the production of high front vowels. It was found, with this model, to be indirect via reflex mechanisms due to the activation of surrounding muscles, not voluntary. For vowel /i/, local motor command variations for the main tongue muscles revealed a non-negligible modification of the alveolar groove in contradiction to the saturation effect hypothesis, due to the role of the anterior genioglossus. Finally, the impact of subject position (supine or upright) on the production of French cardinal vowels was explored and found to be negligible.
126(2009); http://dx.doi.org/10.1121/1.3205400View Description Hide Description
Simulating talker-to-listener distance (TLD) in virtual audio environments requires mimicking natural changes in vocal effort. Studies have identified several acoustic parameters manipulated by talkers when varying vocal effort. However, no systematic study has investigated vocal effort variations due to TLD, under natural conditions, and their perceptual consequences. This work examined the feasibility of varying the vocal effort cues for TLD in synthesized speech and real speech by (a) recording and analyzing single word tokens spoken at , (b) creating synthetic and modified speech tokens that vary in one or more acoustic parameters associated with vocal effort, and (c) conducting perceptual tests on the reference, synthetic, and modified tokens to identify salient cues for TLDperception. Measured changes in fundamental frequency, intensity, and formant frequencies of the reference tokens across TLD were similar to other reports in the literature. Perceptual experiments that asked listeners to estimate TLD showed that TLD estimation is most accurate with real speech; however, large standard deviations in the responses suggest that reliable judgments can only be made for gross changes in TLD.