Index of content:
Volume 103, Issue 6, June 1998
- SPEECH PERCEPTION 
103(1998); http://dx.doi.org/10.1121/1.423087View Description Hide Description
In recent years there has been a great deal of interest in demonstrations of the so-called “Perceptual-Magnet Effect” (PME). In these studies, AX-discrimination tasks purportedly reveal that discriminability of speech sounds from a single category varies with judged phonetic “goodness” of the sounds. However, one possible confound is that category membership is determined by identification of sounds in isolation, whereas, discrimination tasks include pairs of stimuli. In the first experiment of the current study, identifications and goodness judgments were obtained for vowels (/i/–/e/) presented in pairs. A substantial shift in phonetic identity was evidenced with changes in the context vowel. In a second experiment, listeners participated in an AX-discrimination task with the vowel pairs from the first experiment. Using the contextual identification functions from the first experiment, predictions of discriminability were calculated using the classic tenets of Categorical Perception. Obtained discriminability functions were well accounted for by predictions from identification. There was no additional unexplained variance that required the proposal of “perceptual magnets.” These results suggest that PME may be nothing more than further demonstration that general discriminability is greater for cross-category stimulus pairs than for within-category pairs.
103(1998); http://dx.doi.org/10.1121/1.423088View Description Hide Description
Head-related transfer functions (HRTFs) were used to create spatialized stimuli for presentation through earphones. Subjects performed forced-choice, identification tests during which allowed response directions were indicated visually. In each experimental session, subjects were first presented with auditory stimuli in which the stimulus HRTFs corresponded to the allowed response directions. The correspondence between the HRTFs used to generate the stimuli and the directions was then changed so that response directions no longer corresponded to the HRTFs in the natural way. Feedback was used to train subjects as to which spatial cues corresponded to which of the allowed responses. Finally, the normal correspondence between direction and HRTFs was reinstated. This basic experimental paradigm was used to explore the effects of the type of feedback provided, the complexity of the simulated acoustic scene, the number of allowed response positions, and the magnitude of the HRTF transformation subjects had to learn. Data showed that (1) although subjects may not adapt completely to a new relationship between physical stimuli and direction, response bias decreases substantially with training, and (2) the ability to resolve different HRTFs depends both on the stimuli presented and on the state of adaptation of the subject.
103(1998); http://dx.doi.org/10.1121/1.423107View Description Hide Description
A series of experiments was performed in which subjects were trained to interpret auditory localization cues arising from locations different from their normal spatial positions. The exact pattern of mean response to these alterations (as a function of time) was examined in order to begin to develop a quantitative model of adaptation. Mean responses were roughly proportional to the normal position associated with the localization cues presented. As subjects adapted, the best-fit slope (relating mean response and normal position) changed roughly exponentially with time. The exponential rate and adaptation asymptote were found for each subject in each experiment, as well as the rate and asymptote of readaptation to normal cues. The rate of adaptation does not show any statistical dependence on experimental conditions; however, the asymptote of the best-fit slope varied with the strength of the transformation used in each experiment. This result is consistent with the hypothesis that subjects cannot adapt to a nonlinear transformation of auditory localization cues, but instead adapt to a linear approximation of the transformation. Over time, performance changes exponentially towards the best-fit linear approximation for the transformation used in a particular experiment, and the rate of this adaptation does not depend upon the transformation employed.
Complementarity and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise103(1998); http://dx.doi.org/10.1121/1.423069View Description Hide Description
The efficacy of audio-visual interactions in speech perception comes from two kinds of factors. First, at the information level, there is some “complementarity” of audition and vision: It seems that some speech features, mainly concerned with manner of articulation, are best transmitted by the audio channel, while some other features, mostly describing place of articulation, are best transmitted by the video channel. Second, at the information processing level, there is some “synergy” between audition and vision: The audio-visual global identification scores in a number of different tasks involving acoustic noise are generally greater than both the auditory-alone and the visual-alone scores. However, these two properties have been generally demonstrated until now in rather global terms. In the present work, audio-visual interactions at the feature level are studied for French oral vowels which contrast three series, namely front unrounded, front rounded, and back rounded vowels. A set of experiments on the auditory, visual, and audio-visual identification of vowels embedded in various amounts of noise demonstrate that complementarity and synergy in bimodal speech appear to hold for a bundle of individualphonetic features describing place contrasts in oral vowels. At the information level (complementarity), in the audio channel the height feature is the most robust, backness the second most robust one, and rounding the least, while in the video channel rounding is better than height, and backness is almost invisible. At the information processing (synergy) level, transmitted information scores show that all individual features are better transmitted with the ear and the eye together than with each sensor individually.