Volume 118, Issue 4, October 2005
Index of content:
- SPEECH PRODUCTION 
118(2005); http://dx.doi.org/10.1121/1.2033572View Description Hide Description
This study investigates cross-speaker differences in the factors that predict voicing thresholds during abduction–adduction gestures in six normal women. Measures of baseline airflow, pulse amplitude, subglottal pressure, and fundamental frequency were made at voicing offset and onset during intervocalic /h/, produced in varying vowel environments and at different loudness levels, and subjected to relational analyses to determine which factors were most strongly related to the timing of voicing cessation or initiation. The data indicate that (a) all speakers showed differences between voicing offsets and onsets, but the degree of this effect varied across speakers; (b) loudness and vowel environment have speaker-specific effects on the likelihood of devoicing during /h/; and (c) baseline flow measures significantly predicted times of voicing offset and onset in all participants, but other variables contributing to voice timing differed across speakers. Overall, the results suggest that individual speakers have unique methods of achieving phonatory goals during running speech. These data contribute to the literature on individual differences in laryngeal function, and serve as a means of evaluating how well laryngeal models can reproduce the range of voicing behavior used by speakers during running speech tasks.
118(2005); http://dx.doi.org/10.1121/1.2005907View Description Hide Description
Nonlinear dynamic methods and perturbation methods are compared in terms of the effects of signal length, sampling rate, and noise. Results of theoretical and experimental studies quantitatively show that measurements representing frequency and amplitude perturbations are not applicable to chaotic signals because of difficulties in pitch tracking and sensitivity to initial state differences. Perturbationanalyses are only reliable when applied to nearly periodic voice samples of sufficiently long signal lengths that were obtained at high sampling rates and low noise levels. In contrast, nonlinear dynamic methods, such as correlation dimension, allow the quantification of chaotic time series. Additionally, the correlation dimension method presents a more stable analysis of nearly periodic voice samples for shorter signal lengths, lower sampling rates, and higher noise levels. The correlation dimension method avoids some of the methodological issues associated with perturbation methods, and may potentially improve the ability for real time analysis as well as reduce costs in experimental designs for objectively assessing voice disorders.
118(2005); http://dx.doi.org/10.1121/1.2011150View Description Hide Description
This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis.Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.
Acoustic and spectral characteristics of young children’s fricative productions: A developmental perspectivea)118(2005); http://dx.doi.org/10.1121/1.2010407View Description Hide Description
Scientists have made great strides toward understanding the mechanisms of speech production and perception. However, the complex relationships between the acoustic structures of speech and the resulting psychological percepts have yet to be fully and adequately explained, especially in speech produced by younger children. Thus, this study examined the acoustic structure of voiceless fricatives (∕f, θ, s, ʃ∕) produced by adults and typically developing children from of age in terms of multiple acoustic parameters (durations, normalized amplitude, spectral slope, and spectral moments). It was found that the acoustic parameters of spectral slope and variance (commonly excluded from previous studies of child speech) were important acoustic parameters in the differentiation and classification of the voiceless fricatives, with spectral variance being the only measure to separate all four places of articulation. It was further shown that the sibilant contrast between ∕s∕ and ∕ʃ∕ was less distinguished in children than adults, characterized by a dramatic change in several spectral parameters at approximately five years of age. Discriminant analysis revealed evidence that classification models based on adult data were sensitive to these spectral differences in the five-year-old age group.
118(2005); http://dx.doi.org/10.1121/1.2010288View Description Hide Description
Acoustic cues related to the voice source, including harmonic structure and spectral tilt, were examined for relevance to prosodic boundary detection. The measurements considered here comprise five categories: duration, pitch, harmonic structure, spectral tilt, and amplitude. Distributions of the measurements and statistical analysis show that the measurements may be used to differentiate between prosodic categories. Detection experiments on the Boston University Radio Speech Corpus show equal error detection rates around 70% for accent and boundary detection, using only the acoustic measurements described, without any lexical or syntactic information. Further investigation of the detection results shows that duration and amplitude measurements, and, to a lesser degree, pitch measurements, are useful for detecting accents, while all voice source measurements except pitch measurements are useful for boundary detection.