Index of content:
Volume 119, Issue 1, January 2006
- SPEECH PERCEPTION 
119(2006); http://dx.doi.org/10.1121/1.2140806View Description Hide Description
Two experiments explored the concept of the binaural spectrogram [Culling and Colburn, J. Acoust. Soc. Am.107, 517–527 (Year: 2000)] and its relationship to monaurally derived information. In each experiment, speech was added to noise at an adverse signal-to-noise ratio in the binaural configuration. The resulting monaural and binaural cues were analyzed within an array of spectro-temporal bins and then these cues were resynthesized by modulating the intensity and/or interaural correlation of freshly generated noise. Experiment 1 measured the intelligibility of the resynthesized stimuli and compared them with the original NoSo and stimuli at a fixed signal-to-noise ratio. While stimuli were intelligible, each cue in isolation produced similar (very low) intelligibility to the NoSo condition. The resynthesized combination produced intelligibility. Modulation of interaural correlation below and of amplitude above was not as effective as their combination across all frequencies. Experiment 2 measured three-point psychometric functions in which the signal-to-noise ratio of the original stimulus was increased in steps from the level used in experiment 1. Modulation of interaural correlation alone proved to have a flat psychometric function. The functions for and for combined monaural and binaural cues appeared similar in slope, but shifted horizontally. The results indicate that for sentence materials, neither fluctuations in interaural correlation nor in monaural intensity are sufficient to support speech recognition at signal-to-noise ratios where 50% intelligibility is achieved in the configuration; listeners appear to synergistically combine monaural and binaural information in this task, to some extent within the same frequency region.
119(2006); http://dx.doi.org/10.1121/1.2141171View Description Hide Description
Previous research on the perception of dialect variation has measured the perceptual similarity of talkers based on regional dialect using only indirect methods. In the present study, a paired comparison similarity ratings task was used to obtain direct measures of perceptual similarity. Naive listeners were asked to make explicit judgments about the similarity of a set of talkers based on regional dialect. The talkers represented four regional varieties of American English and both genders. Results revealed an additive effect of gender and dialect on mean similarity ratings and two primary dimensions of perceptual dialect similarity: geography (northern versus southern varieties) and dialect markedness (many versus few characteristic properties). The present findings are consistent with earlier research on the perception of dialect variation, as well as recent speech perception studies which demonstrate the integral role of talker gender in speech perception.
119(2006); http://dx.doi.org/10.1121/1.2139627View Description Hide Description
Primary auditory cortex (PAC), located in Heschl’s gyrus (HG), is the earliest cortical level at which sounds are processed. Standard theories of speech perception assume that signal components are given a representation in PAC which are then matched to speech templates in auditory association cortex. An alternative holds that speech activates a specialized system in cortex that does not use the primitives of PAC. Functional magnetic resonance imaging revealed different brain activation patterns in listening to speech and nonspeech sounds across different levels of complexity. Sensitivity to speech was observed in association cortex, as expected. Further, activation in HG increased with increasing levels of complexity with added fundamentals for both nonspeech and speech stimuli, but only for nonspeech when separate sources (release bursts∕fricative noises or their nonspeech analogs) were added. These results are consistent with the existence of a specialized speech system which bypasses more typical processes at the earliest cortical level.
119(2006); http://dx.doi.org/10.1121/1.2133436View Description Hide Description
In this study, the effect of articulation rate and speaking style on the perceived speech rate is investigated. The articulation rate is measured both in terms of the intended phones, i.e., phones present in the assumed canonical form, and as the number of actual, realized phones per second. The combination of these measures reflects the deletion of phones, which is related to speaking style. The effect of the two rate measures on the perceived speech rate is compared in two listening experiments on the basis of a set of intonation phrases with carefully balanced intended and realized phone rates, selected from a German database of spontaneous speech. Because the balance between input-oriented (effort) and output-oriented (communicative) constraints may be different at fast versus slow speech rates, the effect of articulation rate is compared both for fast and for slow phrases from the database. The effect of the listeners’ own speaking habits is also investigated to evaluate if listeners’ perception is based on a projection of their own behavior as a speaker. It is shown that listener judgments reflect both the intended and realized phone rates, and that their judgments are independent of the constraint balance and their own speaking habits.
119(2006); http://dx.doi.org/10.1121/1.2141003View Description Hide Description
Highly proficient German users of English as a second language, and native speakers of American English, listened to nonsense sequences and responded whenever they detected an embedded English word. The responses of both groups were equivalently facilitated by preceding context that both by English and by German phonotactic constraints forced a boundary at word onset (e.g., lecture was easier to detect in moinlecture than in gorklecture, and wish in yarlwish than in plookwish). The American L1 speakers’ responses were strongly facilitated, and the German listeners’ responses almost as strongly facilitated, by contexts that forced a boundary in English but not in German (thrarshlecture, glarshwish). The German listeners’ responses were significantly facilitated also by contexts that forced a boundary in German but not in English (moycelecture, loitwish), while L1 listeners were sensitive to acoustic boundary cues in these materials but not to the phonotactic sequences. The pattern of results suggests that proficient L2 listeners can acquire the phonotactic probabilities of an L2 and use them to good effect in segmenting continuous speech, but at the same time they may not be able to prevent interference from L1 constraints in their L2 listening.