Index of content:
Volume 126, Issue 5, November 2009
- SPEECH PERCEPTION 
Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory modela)126(2009); http://dx.doi.org/10.1121/1.3224721View Description Hide Description
This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. “Microscopic” is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human’s auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am.100, 1703–1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model’s a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
126(2009); http://dx.doi.org/10.1121/1.3216914View Description Hide Description
Speakers vary their speech rate considerably during a conversation, and listeners are able to quickly adapt to these variations in speech rate. Adaptation to fast speech rates is usually measured using artificially time-compressed speech. This study examined adaptation to two types of fast speech: artificially time-compressed speech and natural fast speech. Listeners performed a speeded sentence verification task on three series of sentences: normal-speed sentences, time-compressed sentences, and natural fast sentences. Listeners were divided into two groups to evaluate the possibility of transfer of learning between the time-compressed and natural fast conditions. The first group verified the natural fast before the time-compressed sentences, while the second verified the time-compressed before the natural fast sentences. The results showed transfer of learning when the time-compressed sentences preceded the natural fast sentences, but not when natural fast sentences preceded the time-compressed sentences. The results are discussed in the framework of theories on perceptual learning. Second, listeners show adaptation to the natural fast sentences, but performance for this type of fast speech does not improve to the level of time-compressed sentences.
126(2009); http://dx.doi.org/10.1121/1.3212930View Description Hide Description
Talker intelligibility and perceptual adaptation under cochlear implant (CI)-simulation and speech in multi-talker babble were compared. The stimuli consisted of 100 sentences produced by 20 native English talkers. The sentences were processed to simulate listening with an eight-channel CI or were mixed with multi-talker babble. Stimuli were presented to 400 listeners in a sentence transcription task (200 listeners in each condition). Perceptual adaptation was measured for each talker by comparing intelligibility in the first 20 sentences of the experiment to intelligibility in the last 20 sentences. Perceptual adaptation patterns were also compared across the two degradation conditions by comparing performance in blocks of ten sentences. The most intelligible talkers under CI-simulation also tended to be the most intelligible talkers in multi-talker babble. Furthermore, listeners demonstrated a greater degree of perceptual adaptation in the CI-simulation condition compared to the multi-talker babble condition although the extent of adaptation varied widely across talkers. Listeners reached asymptote later in the experiment in the CI-simulation condition compared with the multi-talker babble condition. Overall, these two forms of degradation did not differ in their effect on talker intelligibility, although they did result in differences in the amount and time-course of perceptual adaptation.
126(2009); http://dx.doi.org/10.1121/1.3224715View Description Hide Description
A quantitative “cross-language assimilation overlap” method for testing predictions of the Perceptual Assimilation Model (PAM) was implemented to compare results of a discrimination experiment with the listeners’ previously reported assimilation data. The experiment examined discrimination of Parisian French (PF) front rounded vowels /y/ and /œ/. Three groups of American English listeners differing in their French experience (no experience [NoExp], formal experience [ModExp], and extensive formal-plus-immersion experience [HiExp]) performed discrimination of PF /y-u/, /y-o/, /œ-o/, /œ-u/, /y-i/, /y-ɛ/, /œ-ɛ/, /œ-i/, /y-œ/, /u-i/, and /a-ɛ/. Vowels were in bilabial /rabVp/ and alveolar /radVt/ contexts. More errors were found for PF front vs back rounded vowel pairs (16%) than for PF front unrounded vs rounded pairs (2%). Overall, ModExp listeners did not perform more accurately (11% errors) than NoExp listeners (13% errors). Extensive immersion experience, however, was associated with fewer errors (3%) than formal experience alone, although discrimination of PF /y-u/ remained relatively poor (12% errors) for HiExp listeners. More errors occurred on pairs involving front vs back rounded vowels in alveolar context (20% errors) than in bilabial (11% errors). Significant correlations were revealed between listeners’ assimilation overlap scores and their discrimination errors, suggesting that the PAM may be extended to second-language (L2) vowel learning.
126(2009); http://dx.doi.org/10.1121/1.3238257View Description Hide Description
This paper presents a compact graphical method for comparing the performance of individual hearing impaired (HI) listeners with that of an average normal hearing (NH) listener on a consonant-by-consonant basis. This representation, named the consonant loss profile (CLP), characterizes the effect of a listener’s hearing loss on each consonant over a range of performance. The CLP shows that the consonant loss, which is the signal-to-noise ratio (SNR) difference at equal NH and HI scores, is consonant-dependent and varies with the score. This variation in the consonant loss reveals that hearing loss renders some consonants unintelligible, while it reduces noise-robustness of some other consonants. The conventional SNR-loss metric , defined as the SNR difference at 50% recognition score, is insufficient to capture this variation. The value is on average lower when measured with sentences using standard clinical procedures than when measured with nonsense syllables. A listener with symmetric hearing loss may not have identical CLPs for both ears. Some consonant confusions by HI listeners are influenced by the high-frequency hearing loss even at a presentation level as high as sound pressure level.