Volume 107, Issue 6, June 2000
Index of content:
- SPEECH PRODUCTION 
107(2000); http://dx.doi.org/10.1121/1.429412View Description Hide Description
This investigation is the second in a series to examine a potential source of reduced intelligibility in dysarthric speech, namely the mismatch between listeners’ perceptual strategies and the acoustic information available in the dysarthric speech signal. Lexical boundary error (LBE) analysis was conducted on listener transcripts from phrases produced by speakers with hypokinetic dysarthria, ataxic dysarthria, and normal controls. By design, the hypokinetic and ataxic dysarthric tapes elicited similar intelligibility (words-correct) scores. However, they elicited different numbers and patterns of lexical boundary errors. The nature of the error pattern differences can be traced to the listeners’ use of available syllabic strength information to segment the acoustic stream. Specifically, although both dysarthric speech samples elicited numerous lexical boundary errors, those for the hypokinetic speech generally conformed to predictions offered from studies of degraded normal speech. Those for the ataxic speech did not conform strongly to such predictions. It appears that the prosodic deficits of the ataxic speech (tendency toward syllabic isochrony, excessive loudness variation, and reduced vowel working space consequent to reductions in vowel strength) posed more of a problem for listeners than did the prosodic deficits of the hypokinetic speech (rapid rate, monotony, reduced vowel working space).
107(2000); http://dx.doi.org/10.1121/1.429413View Description Hide Description
Vowelformants play an important role in speech theories and applications; however, the same formant values measured for the steady-state part of a vowel can correspond to different vowel categories. Experimental evidence indicates that dynamic information can also contribute to vowel characterization. Hence, dynamically modelingformant transitions may lead to quantitatively testable predictions in vowel categorization. Because the articulatory strategy used to manage different speaking rates and contrastive stress may depend on speaker and situation, the parameter values of a dynamic formantmodel may vary with speaking rate and stress. In most experiments speaking rate is rarely controlled, only two or three rates are tested, and most corpora contain just a few repetitions of each item. As a consequence, the dependence of dynamic models on those factors is difficult to gauge. This article presents a study of 2300 [iai] or [iɛi] stimuli produced by two speakers at nine or ten speaking rates in a carrier sentence for two contrastive stress patterns. The corpus was perceptually evaluated by naive listeners. Formant frequencies were measured during the steady-state parts of the stimuli, and the formant transitions were dynamically and kinematically modeled. The results indicate that (1) the corpus was characterized by a contextual assimilation instead of a centralization effect; (2) dynamic or kinematicmodeling was equivalent as far as the analysis of the model parameters was concerned; (3) the dependence of the model parameter estimates on speaking rate and stress suggests that the formant transitions were sharper for high speaking rate, but no consistent trend was found for contrastive stress; (4) the formant frequencies measured in the steady-state parts of the vowels were sufficient to explain the perceptual results while the dynamic parameters of the models were not.
107(2000); http://dx.doi.org/10.1121/1.429414View Description Hide Description
The acoustic effects of the adjustment in vocal effort that is required when the distance between speaker and addressee is varied over a large range (0.3–187.5 m) were investigated in phonated and, at shorter distances, also in whispered speech. Several characteristics were studied in the same sentence produced by men, women, and 7-year-old boys and girls: duration of vowels and consonants, pausing and occurrence of creaky voice, mean and range of certain formant frequencies in [a] and sound-pressure level (SPL) of voiced segments and [s], and spectral emphasis. In addition to levels and emphasis, vowel duration, and were substantially affected. “Vocal effort” was defined as the communication distance estimated by a group of listeners for each utterance. Most of the observed effects correlated better with this measure than with the actual distance, since some additional factors affected the speakers’ choice. Differences between speaker groups emerged in segment durations, pausing behavior, and in the extent to which the SPL of [s] was affected. The whispered versions are compared with the phonated versions produced by the same speakers at the same distance. Several effects of whispering are found to be similar to those of increasing vocal effort.