Index of content:
Volume 124, Issue 4, October 2008
- SPEECH PERCEPTION 
Perceptual development of phoneme contrasts: How sensitivity changes along acoustic dimensions that contrast phoneme categories124(2008); http://dx.doi.org/10.1121/1.2967472View Description Hide Description
Listeners discriminate acoustic differences between phoneme categories at a higher level than similarly sized differences within phoneme categories. The question this paper aims to answer is how this pattern in perceptual sensitivity develops along an acoustic dimension that contrasts two non-native speechsounds: through acquired distinctiveness, through acquired similarity, or through a combination of the two. A pretest–training–post-test experiment was designed to study perceptual development directly, i.e., by including (i) a discrimination task to measure perceptual sensitivity, (ii) a transfer test to ensure language learning instead of stimulus learning, and (iii) a control group to exclude task repetition as an explanation of improvement. It is shown that the typical peak in perceptual sensitivity near a phoneme boundary that native listeners show is not found in relatively inexperienced language learners, despite their ability to classify a continuum in a nativelike way after short laboratory training. Experiment II indicates that a discrimination peak may be achieved by language learners, but only after much more language experience than short-term laboratory training can offer. Furthermore, reasons are given why classification improvement in the laboratory should not be taken as evidence for (i) increased discrimination of the newly learned phonemes and (ii) learning of phoneme representations.
124(2008); http://dx.doi.org/10.1121/1.2967865View Description Hide Description
For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the ideal binary mask. Only 16 filter channels and a frame rate of are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception.
Hybridizing conversational and clear speech to determine the degree of contribution of acoustic features to intelligibility124(2008); http://dx.doi.org/10.1121/1.2967844View Description Hide Description
Speakers naturally adopt a special “clear” (CLR) speaking style in order to be better understood by listeners who are moderately impaired in their ability to understand speech due to a hearing impairment, the presence of background noise, or both. In contrast, speech intended for nonimpaired listeners in quiet environments is referred to as “conversational” (CNV). Studies have shown that the intelligibility of CLR speech is usually higher than that of CNV speech in adverse circumstances. It is not known which individual acoustic features or combinations of features cause the higher intelligibility of CLR speech. The objective of this study is to determine the contribution of some acoustic features to intelligibility for a single speaker. The proposed method creates “hybrid” (HYB) speech stimuli that selectively combine acoustic features of one sentence spoken in the CNV and CLR styles. The intelligibility of these stimuli is then measured in perceptual tests, using 96 phonetically balanced sentences. Results for one speaker show significant sentence-level intelligibility improvements over CNV speech when replacing certain combinations of short-term spectra, phoneme identities, and phoneme durations of CNV speech with those from CLR speech, but no improvements for combinations involving fundamental frequency, energy, or nonspeech events (pauses).