Volume 107, Issue 2, February 2000
Index of content:
- SPEECH PRODUCTION 
Spectral characterization of jitter, shimmer, and additive noise in synthetically generated voice signals107(2000); http://dx.doi.org/10.1121/1.428272View Description Hide Description
Alteration of the harmonic structure in voice source spectra, taken over at least two periods of the waveform, may occur due to the presence of fundamental frequency perturbation, amplitude perturbation, additive noise, or changes within the glottal source signal itself. In order to make accurate inferences regarding glottal-flow dynamics or perceptual evaluations based on spectral measurements taken from the acoustic speech waveform, investigation of the spectral features of each aperiodic component is required. Based on a heuristic development involving a consideration of the partial sum of the Fourier series taken for two periods of a jittered, shimmered, and (additive, random) noise-contaminated signal, the corresponding spectral characteristics are hypothesized. Subsequent to this, the Fourier series coefficients are calculated for the two periods in order to test the hypotheses. Definite spectral differences are found for each aperiodic component; based on these findings differential quantitative spectral measurements are suggested. Further supportive evidence is obtained through use of Fourier transform and periodogram-averaged calculations. The analysis is carried out on synthetically generated glottal-pulse waveforms and on radiated speech waveforms. A discussion of the results is given in terms of voice aperiodicity in general and in terms of their implication for future studies involving human voice signals.
Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology107(2000); http://dx.doi.org/10.1121/1.428279View Description Hide Description
To determine whether expert fluency ratings of read speech can be predicted on the basis of automatically calculated temporal measures of speech quality, an experiment was conducted with read speech of 20 native and 60 non-native speakers of Dutch. The speechmaterial was scored for fluency by nine experts and was then analyzed by means of an automatic speech recognizer in terms of quantitative measures such as speech rate, articulation rate, number and length of pauses, number of dysfluencies, mean length of runs, and phonation/time ratio. The results show that expert ratings of fluency in read speech are reliable (Cronbach’s α varies between 0.90 and 0.96) and that these ratings can be predicted on the basis of quantitative measures: for six automatic measures the magnitude of the correlations with the fluency scores varies between 0.81 and 0.93. Rate of speech appears to be the best predictor: correlations vary between 0.90 and 0.93. Two other important determinants of reading fluency are the rate at which speakers articulate the sounds and the number of pauses they make. Apparently, rate of speech is such a good predictor of perceived fluency because it incorporates these two aspects.