Volume 130, Issue 5, November 2011
Index of content:
- SPEECH PRODUCTION 
130(2011); http://dx.doi.org/10.1121/1.3644913View Description Hide Description
The influence of vocal fold geometry and stiffness on phonation onset was experimentally investigated using a body-cover physical model of the vocal folds. Results showed that a lower phonation threshold pressure and phonation onset frequency can be achieved by reducing body-layer or cover-layer stiffness, reducing medial surface thickness, or increasing cover-layer depth. Increasing body-layer stiffness also restricted vocal fold motion to the cover layer and reduced prephonatory glottal opening. Excitation of anterior–posterior modes was also observed, particularly for large values of the body-cover stiffness ratio. The results of this study were also discussed in relation to previous theoretical and experimental studies.
Segmentation of expiratory and inspiratory sounds in baby cry audio recordings using hidden Markov models130(2011); http://dx.doi.org/10.1121/1.3641377View Description Hide Description
The paper describes an application of machine learning techniques to identify expiratory and inspiration phases from the audio recording of human baby cries. Crying episodes were recorded from 14 infants, spanning four vocalization contexts in their first 12 months of age; recordings from three individuals were annotated manually to identify expiratory and inspiratory sounds and used as training examples to segment automatically the recordings of the other 11 individuals. The proposed algorithm uses a hidden Markovmodel architecture, in which state likelihoods are estimated either with Gaussian mixture models or by converting the classification decisions of a support vector machine. The algorithm yields up to 95% classification precision (86% average), and its ability generalizes over different babies, different ages, and vocalization contexts. The technique offers an opportunity to quantify expiration duration, count the crying rate, and other time-related characteristics of baby crying for screening, diagnosis, and research purposes over large populations of infants.
130(2011); http://dx.doi.org/10.1121/1.3643826View Description Hide Description
Past studies have shown that when formants are perturbed in real time, speakers spontaneously compensate for the perturbation by changing their formant frequencies in the opposite direction to the perturbation. Further, the pattern of these results suggests that the processing of auditory feedback error operates at a purely acoustic level. This hypothesis was tested by comparing the response of three language groups to real-time formant perturbations, (1) native English speakers producing an English vowel /ε/, (2) native Japanese speakers producing a Japanese vowel (), and (3) native Japanese speakers learning English, producing /ε/. All three groups showed similar production patterns when F1 was decreased; however, when F1 was increased, the Japanese groups did not compensate as much as the native English speakers. Due to this asymmetry, the hypothesis that the compensatory production for formant perturbation operates at a purely acoustic level was rejected. Rather, some level of phonological processing influences the feedback processing behavior.