Index of content:
Volume 109, Issue 3, March 2001
- SPEECH PERCEPTION 
109(2001); http://dx.doi.org/10.1121/1.1340643View Description Hide Description
Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%–98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and −12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering.
Recognition of spectrally asynchronous speech by normal-hearing listeners and Nucleus-22 cochlear implant users109(2001); http://dx.doi.org/10.1121/1.1344158View Description Hide Description
This experiment examined the effects of spectral resolution and fine spectral structure on recognition of spectrally asynchronous sentences by normal-hearing and cochlear implant listeners. Sentence recognition was measured in six normal-hearing subjects listening to either full-spectrum or noise-band processors and five Nucleus-22 cochlear implant listeners fitted with 4-channel continuous interleaved sampling (CIS) processors. For the full-spectrum processor, the speech signals were divided into either 4 or 16 channels. For the noise-band processor, after band-pass filtering into 4 or 16 channels, the envelope of each channel was extracted and used to modulate noise of the same bandwidth as the analysis band, thus eliminating the fine spectral structure available in the full-spectrum processor. For the 4-channel CIS processor, the amplitude envelopes extracted from four bands were transformed to electric currents by a power function and the resulting electric currents were used to modulate pulse trains delivered to four electrode pairs. For all processors, the output of each channel was time-shifted relative to other channels, varying the channel delay across channels from 0 to 240 ms (in 40-ms steps). Within each delay condition, all channels were desynchronized such that the cross-channel delays between adjacent channels were maximized, thereby avoiding local pockets of channel synchrony. Results show no significant difference between the 4- and 16-channel full-spectrum speech processor for normal-hearing listeners. Recognition scores dropped significantly only when the maximum delay reached 200 ms for the 4-channel processor and 240 ms for the 16-channel processor. When fine spectral structures were removed in the noise-band processor, sentence recognition dropped significantly when the maximum delay was 160 ms for the 16-channel noise-band processor and 40 ms for the 4-channel noise-band processor. There was no significant difference between implant listeners using the 4-channel CIS processor and normal-hearing listeners using the 4-channel noise-band processor. The results imply that when fine spectral structures are not available, as in the implant listener’s case, increased spectral resolution is important for overcoming cross-channel asynchrony in speech signals.
Vowel perception by adults and children with normal language and specific language impairment: Based on steady states or transitions?109(2001); http://dx.doi.org/10.1121/1.1349428View Description Hide Description
The current investigation studied whether adults, children with normally developing language aged 4–5 years, and children with specific language impairment, aged 5–6 years identified vowels on the basis of steady-state or transitional formant frequencies. Four types of synthetic tokens, created with a female voice, served as stimuli: (1) steady-state centers for the vowels [i] and [æ]; (2) voweless tokens with transitions appropriate for [bib] and [bæb]; (3) “congruent” tokens that combined the first two types of stimuli into [bib] and [bæb]; and (4) “conflicting” tokens that combined the transitions from [bib] with the vowel from [bæb] and vice versa. Results showed that children with language impairment identified the [i] vowel more poorly than other subjects for both the voweless and congruent tokens. Overall, children identified vowels most accurately in steady-state centers and congruent stimuli (ranging between 94%–96%). They identified the vowels on the basis of transitions only from “voweless” tokens with 89% and 83.5% accuracy for the normally developing and language impaired groups, respectively. Children with normally developing language used steady-state cues to identify vowels in 87% of the conflicting stimuli, whereas children with language impairment did so for 79% of the stimuli. Adults were equally accurate for voweless, steady-state, and congruent tokens (ranging between 99% to 100% accuracy) and used both steady-state and transition cues for vowel identification. Results suggest that most listeners prefer the steady state for vowel identification but are capable of using the onglide/offglide transitions for vowel identification. Results were discussed with regard to Nittrouer's developmental weighting shift hypothesis and Strange and Jenkin's dynamic specification theory.
109(2001); http://dx.doi.org/10.1121/1.1348009View Description Hide Description
The effect of talker and token variability on speech perception has engendered a great deal of research. However, most of this research has compared listener performance in multiple-talker (or variable) situations to performance in single-talker conditions. It remains unclear to what extent listeners are affected by the degree of variability within a talker, rather than simply the existence of variability (being in a multitalker environment). The present study has two goals: First, the degree of variability among speakers in their /s/ and /∫/ productions was measured. Even among a relatively small pool of talkers, there was a range of speech variability: some talkers had /s/ and /∫/ categories that were quite distinct from one another in terms of frication centroid and skewness, while other speakers had categories that actually overlapped one another. The second goal was to examine whether this degree of variability within a talker influenced perception. Listeners were presented with natural /s/ and /∫/ tokens for identification, under ideal listening conditions, and slower response times were found for speakers whose productions were more variable than for speakers with more internal consistency in their speech. This suggests that the degree of variability, not just the existence of it, may be the more critical factor in perception.
Relations between intelligibility of narrow-band speech and auditory functions, both in the 1-kHz frequency region109(2001); http://dx.doi.org/10.1121/1.1349429View Description Hide Description
Relations between perception of suprathreshold speech and auditory functions were examined in 24 hearing-impaired listeners and 12 normal-hearing listeners. The speech intelligibility index (SII) was used to account for audibility. The auditory functions included detection efficiency, temporal and spectral resolution, temporal and spectral integration, and discrimination of intensity, frequency, rhythm, and spectro-temporal shape. All auditory functions were measured at 1 kHz. Speech intelligibility was assessed with the speech-reception threshold (SRT) in quiet and in noise, and with the speech-reception bandwidth threshold previously developed for investigating speech perception in a limited frequency region around 1 kHz. The results showed that the elevated SRT in quiet could be explained on the basis of audibility. Audibility could only partly account for the elevated SRT values in noise and the deviant values, suggesting that suprathreshold deficits affected intelligibility in these conditions. SII predictions for the improved significantly by including the individually measured upward spread of masking in the SII model. Reduced spectral resolution, reduced temporal resolution, and reduced frequency discrimination appeared to be related to speech perception deficits. Loss of peripheral compression appeared to have the smallest effect on the intelligibility of suprathreshold speech.