Volume 113, Issue 2, February 2003
Index of content:
- SPEECH PERCEPTION 
113(2003); http://dx.doi.org/10.1121/1.1513647View Description Hide Description
The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a “flooring” procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /hVd/ context spoken by 45 men, 48 women, and 46 children. The narrow band model classified vowels in this database with a degree of accuracy (91.4%) approaching that of human listeners.
Evaluating the function of phonetic perceptual phenomena within speech recognition: An examination of the perception of /d/–/t/ by adult cochlear implant users113(2003); http://dx.doi.org/10.1121/1.1531985View Description Hide Description
This study examined whether cochlear implant users must perceive differences along phonetic continua in the same way as do normal hearing listeners (i.e., sharp identification functions, poor within-category sensitivity, high between-category sensitivity) in order to recognize speech accurately. Adult postlingually deafened cochlear implant users, who were heterogeneous in terms of their implants and processing strategies, were tested on two phoneticperception tasks using a synthetic /dɑ/–/tɑ/ continuum (phoneme identification and discrimination) and two speech recognition tasks using natural recordings from ten talkers (open-set word recognition and forced-choice /d/–/t/ recognition). Cochlear implant users tended to have identification boundaries and sensitivity peaks at voice onset times (VOT) that were longer than found for normal-hearing individuals. Sensitivity peak locations were significantly correlated with individual differences in cochlear implant performance; individuals who had a /d/–/t/ sensitivity peak near normal-hearing peak locations were most accurate at recognizing natural recordings of words and syllables. However, speech recognition was not strongly related to identification boundary locations or to overall levels of discrimination performance. The results suggest that perceptual sensitivity affects speech recognition accuracy, but that many cochlear implant users are able to accurately recognize speech without having typical normal-hearing patterns of phoneticperception.
113(2003); http://dx.doi.org/10.1121/1.1537708View Description Hide Description
The present study examined the effects of short-term perceptual training on normal-hearing listeners’ ability to adapt to spectrally altered speechpatterns. Using noise-band vocoder processing, acoustic information was spectrally distorted by shifting speechinformation from one frequency region to another. Six subjects were tested with spectrally shifted sentences after five days of practice with upwardly shifted training sentences. Training with upwardly shifted sentences significantly improved recognition of upwardly shifted speech; recognition of downwardly shifted speech was nearly unchanged. Three subjects were later trained with downwardly shifted speech. Results showed that the mean improvement was comparable to that observed with the upwardly shifted training. In this retrain and retest condition, performance was largely unchanged for upwardly shifted sentence recognition, suggesting that these listeners had retained some of the improved speech perception resulting from the previous training. The results suggest that listeners are able to partially adapt to a spectral shift in acoustic speechpatterns over the short-term, given sufficient training. However, the improvement was localized to where the spectral shift was trained, as no change in performance was observed for spectrally altered speech outside of the trained regions.
Simulations of tonotopically mapped speech processors for cochlear implant electrodes varying in insertion depth113(2003); http://dx.doi.org/10.1121/1.1536928View Description Hide Description
It has been claimed that speech recognition with a cochlear implant is dependent on the frequency alignment of analysis bands in the speech processor with characteristic frequencies (CFs) at electrode locations. However, the most apical electrode location can often have a CF of 1 kHz or more. The use of filters aligned in frequency to relatively basal electrode arrays leads to the loss of lower frequency speechinformation. This study simulates a frequency-aligned speech processor and common array insertion depths to assess this significance of this loss. Noise-excited vocoders simulated processors driving eight electrodes 2 mm apart. Analysis filters always had center frequencies matching the CFs of the simulated stimulation sites. The simulated insertion depth of the most apical electrode was varied in 2-mm steps between 25 mm (CF 502 Hz) and 17 mm (CF 1851 Hz) from the cochlear base. Identification of consonants, vowels, and words in sentences all showed a significant decline between each of the three more basal simulated electrode configurations. Thus, if implant processors used analysis filters frequency-aligned to electrode CFs, patients whose most apical electrode is 19 mm (CF 1.3 kHz) or less from the cochlear base would suffer a significant loss of speechinformation.