Volume 121, Issue 6, June 2007
Index of content:
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
121(2007); http://dx.doi.org/10.1121/1.2722220View Description Hide Description
In this study, vocal tract area functions for one American English speaker, recorded using magnetic resonance imaging, were used to simulate and analyze the acoustics of vowel nasalization. Computer vocal tract models and susceptance plots were used to study the three most important sources of acoustic variability involved in the production of nasalized vowels: velar coupling area, asymmetry of nasal passages, and the sinus cavities.Analysis of the susceptance plots of the pharyngeal and oral cavities,, and the nasal cavity,, helped in understanding the movement of poles and zeros with varying coupling areas. Simulations using two nasal passages clearly showed the introduction of extra pole-zero pairs due to the asymmetry between the passages. Simulations with the inclusion of maxillary and sphenoidal sinuses showed that each sinus can potentially introduce one pole-zero pair in the spectrum. Further, the right maxillary sinus introduced a pole-zero pair at the lowest frequency. The effective frequencies of these poles and zeros due to the sinuses in the sum of the oral and nasal cavity outputs changes with a change in the configuration of the oral cavity, which may happen due to a change in the coupling area, or in the vowel being articulated.
121(2007); http://dx.doi.org/10.1121/1.2730743View Description Hide Description
An alternative and complete derivation of the vocal tract length sensitivity function, which is an equation for finding a change in formant frequency due to perturbation of the vocal tract length [Fant, Quarterly Progress and Status Rep. No. 4, Speech Transmission Laboratory, Kungliga Teknisha Högskolan, Stockholm, 1975, pp. 1–14] is presented. It is based on the adiabatic invariance of the vocal tract as an acoustic resonator and on the radiation pressure on the wall and at the exit of the vocal tract. An algorithm for tuning the vocal tract shape to match the formant frequencies to target values, such as those of a recorded speech signal, which was proposed in Story [J. Acoust. Soc. Am.119, 715–718 (2006)], is extended so that the vocal tract length can also be changed. Numerical simulation of this extended algorithm shows that it can successfully convert between the vocal tract shapes of a male and a female for each of five Japanese vowels.
121(2007); http://dx.doi.org/10.1121/1.2714913View Description Hide Description
In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noisecharacteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.