Index of content:
Volume 110, Issue 2, August 2001
- SPEECH PERCEPTION 
Consonant identification under maskers with sinusoidal modulation: Masking release or modulation interference?110(2001); http://dx.doi.org/10.1121/1.1384909View Description Hide Description
The present study investigated the effect of envelope modulations in a background masker on consonant recognition by normal hearing listeners. It is well known that listeners understand speech better under a temporally modulated masker than under a steady masker at the same level, due to masking release. The possibility of an opposite phenomenon, modulation interference, whereby speech recognition could be degraded by a modulated masker due to interference with auditory processing of the speech envelope, was hypothesized and tested under various speech and masker conditions. It was of interest whether modulation interference for speech perception, if it were observed, could be predicted by modulation masking, as found in psychoacoustic studies using nonspeech stimuli. Results revealed that masking release measurably occurred under a variety of conditions, especially when the speech signal maintained a high degree of redundancy across several frequency bands. Modulation interference was also clearly observed under several circumstances when the speech signal did not contain a high redundancy. However, the effect of modulation interference did not follow the expected pattern from psychoacoustic modulation masking results. In conclusion, (1) both factors, modulation interference and masking release, should be accounted for whenever a background masker contains temporal fluctuations, and (2) caution needs to be taken when psychoacoustic theory on modulation masking is applied to speech recognition.
110(2001); http://dx.doi.org/10.1121/1.1384908View Description Hide Description
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2’s were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.
Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants110(2001); http://dx.doi.org/10.1121/1.1381538View Description Hide Description
Speech recognition was measured as a function of spectral resolution (number of spectral channels) and speech-to-noise ratio in normal-hearing (NH) and cochlear-implant(CI) listeners. Vowel, consonant, word, and sentence recognition were measured in five normal-hearing listeners, ten listeners with the Nucleus-22 cochlear implant, and nine listeners with the Advanced Bionics Clarion cochlear implant. Recognition was measured as a function of the number of spectral channels (noise bands or electrodes) at signal-to-noise ratios of +15, +10, +5, 0 dB, and in quiet. Performance with three different speech processing strategies (SPEAK, CIS, and SAS) was similar across all conditions, and improved as the number of electrodes increased (up to seven or eight) for all conditions. For all noise levels, vowel and consonant recognition with the SPEAK speech processor did not improve with more than seven electrodes, while for normal-hearing listeners, performance continued to increase up to at least 20 channels. Speech recognition on more difficult speech materials (word and sentence recognition) showed a marginally significant increase in Nucleus-22 listeners from seven to ten electrodes. The average implant score on all processing strategies was poorer than scores of NH listeners with similar processing. However, the best CI scores were similar to the normal-hearing scores for that condition (up to seven channels). CI listeners with the highest performance level increased in performance as the number of electrodes increased up to seven, while CI listeners with low levels of speech recognition did not increase in performance as the number of electrodes was increased beyond four. These results quantify the effect of number of spectral channels on speech recognition in noise and demonstrate that most CI subjects are not able to fully utilize the spectral information provided by the number of electrodes used in their implant.
Effects of low-pass filtering on the intelligibility of speech in quiet for people with and without dead regions at high frequencies110(2001); http://dx.doi.org/10.1121/1.1381534View Description Hide Description
A dead region is a region of the cochlea where there are no functioning inner hair cells (IHCs) and/or neurons; it can be characterized in terms of the characteristic frequencies of the IHCs bordering that region. We examined the effect of high-frequency amplification on speech perception for subjects with high-frequency hearing loss with and without dead regions. The limits of any dead regions were defined by measuring psychophysical tuning curves and were confirmed using the TEN test described in Moore et al. [Br. J. Audiol. 34, 205–224 (2000)]. The speech stimuli were vowel–consonant–vowel (VCV) nonsense syllables, using one of three vowels (/i/, /a/, and /u/) and 21 different consonants. In a baseline condition, subjects were tested using broadband stimuli with a nominal input level of 65 dB SPL. Prior to presentation via Sennheiser HD580 earphones, the stimuli were subjected to the frequency-gain characteristic prescribed by the “Cambridge” formula, which is intended to give speech at 65 dB SPL the same overall loudness as for a normal listener, and to make the average loudness of the speech the same for each critical band over the frequency range important for speech intelligibility (in a listener without a dead region). The stimuli for all other conditions were initially subjected to this same frequency-gain characteristic. Then, the speech was low-pass filtered with various cutoff frequencies. For subjects without dead regions, performance generally improved progressively with increasing cutoff frequency. This indicates that they benefited from high-frequency information. For subjects with dead regions, two patterns of performance were observed. For most subjects, performance improved with increasing cutoff frequency until the cutoff frequency was somewhat above the estimated edge frequency of the dead region, but hardly changed with further increases. For a few subjects, performance initially improved with increasing cutoff frequency and then worsened with further increases, although the worsening was significant only for one subject. The results have important implications for the fitting of hearing aids.