Index of content:
Volume 104, Issue 4, October 1998
- SPEECH PERCEPTION 
104(1998); http://dx.doi.org/10.1121/1.423751View Description Hide Description
For all but the most profoundly hearing-impaired (HI) individuals, auditory–visual (AV) speech has been shown consistently to afford more accurate recognition than auditory (A) or visual (V) speech. However, the amount of AV benefit achieved (i.e., the superiority of AV performance in relation to unimodal performance) can differ widely across HI individuals. To begin to explain these individual differences, several factors need to be considered. The most obvious of these are deficient A and V speech recognition skills. However, large differences in individuals’ AV recognition scores persist even when unimodal skill levels are taken into account. These remaining differences might be attributable to differing efficiency in the operation of a perceptual process that integrates A and V speechinformation. There is at present no accepted measure of the putative integration process. In this study, several possible integration measures are compared using both congruent and discrepant AV nonsense syllable and sentence recognition tasks. Correlations were tested among the integration measures, and between each integration measure and independent measures of AV benefit for nonsense syllables and sentences in noise. Integration measures derived from tests using nonsense syllables were significantly correlated with each other; on these measures, HI subjects show generally high levels of integration ability. Integration measures derived from sentence recognition tests were also significantly correlated with each other, but were not significantly correlated with the measures derived from nonsense syllable tests. Similarly, the measures of AV benefit based on nonsense syllable recognition tests were found not to be significantly correlated with the benefit measures based on tests involving sentence materials. Finally, there were significant correlations between AV integration and benefit measures derived from the same class of speechmaterials, but nonsignificant correlations between integration and benefit measures derived from different classes of materials. These results suggest that the perceptual processes underlying AV benefit and the integration of A and V speechinformation might not operate in the same way on nonsense syllable and sentence input.
104(1998); http://dx.doi.org/10.1121/1.423752View Description Hide Description
A variable-duration notched-noise experiment was conducted in a noise context. Broadband noise preceded and followed a tone and notched noise of similar duration. Thresholds were measured at four durations (10, 30, 100, and 300 ms), two center frequencies (0.6, 2.0 kHz), and five relative notch widths (0.0, 0.1, 0.2, 0.4, 0.8). At 0.6 kHz, 10-ms thresholds decrease 6 dB across notch widths, while 300-ms thresholds decrease over 35 dB. These trends are similar but less pronounced at 2 kHz. In a second experiment, the short-duration notched noise was replaced with a flat noise which provided an equivalent amount of simultaneous masking and thresholds dropped by as much as 20 dB. A simple combination of simultaneous and nonsimultaneous masking is unable to predict these results. Instead, it appears that the elevated thresholds at short durations are dependent on the spectral shape of the simultaneous masker.
104(1998); http://dx.doi.org/10.1121/1.423753View Description Hide Description
This investigation evaluated a possible source of reduced intelligibility in hypokinetic dysarthric speech, namely the mismatch between listeners’ perceptual strategies and the acoustic information available in the dysarthric speech signal. A paradigm of error analysis was adopted in which listener transcriptions of phrases were coded for the presence and type of word boundary errors. Seventy listeners heard 60 phrases produced by speakers with hypokinetic dysarthria. The six-syllable phrases alternated strong and weak syllables and ranged in length from three to five words. Lexical boundary violations were defined as erroneous insertions or deletions of lexical boundaries that occurred either before strong or before weak syllables. A total of 1596 lexical boundary errors in the listeners’ transcriptions was identified unanimously by three independent judges. The pattern of errors generally conformed with the predictions of the Metrical Segmentation Strategy hypothesis [Cutler and Norris, J. Exp. Psychol. 14, 113–121 (1988)] which posits that listeners attend to strong syllables to identify word onsets. However, the strength of adherence to this pattern varied across speakers. Comparison of acoustic evidence of syllabic strength to lexical boundary error patterns revealed a source of intelligibility deficit associated with this particular type of dysarthric speechpattern.
104(1998); http://dx.doi.org/10.1121/1.423774View Description Hide Description
Recognition of consonants, vowels, and sentences was measured in conditions of reduced spectral resolution and distorted spectral distribution of temporal envelope cues. Speech materials were processed through four bandpass filters (analysis bands), half-wave rectified, and low-pass filtered to extract the temporal envelope from each band. The envelope from each speech band modulated a band-limited noise (carrier bands). Analysis and carrier bands were manipulated independently to alter the spectral distribution of envelope cues. Experiment I demonstrated that the location of the cutoff frequencies defining the bands was not a critical parameter for speech recognition, as long as the analysis and carrier bands were matched in frequency extent. Experiment II demonstrated a dramatic decrease in performance when the analysis and carrier bands did not match in frequency extent, which resulted in a warping of the spectral distribution of envelope cues. Experiment III demonstrated a large decrease in performance when the carrier bands were shifted in frequency, mimicking the basal position of electrodes in a cochlear implant. And experiment IV showed a relatively minor effect of the overlap in the noise carrier bands, simulating the overlap in neural populations responding to adjacent electrodes in a cochlear implant. Overall, these results show that, for four bands, the frequency alignment of the analysis bands and carrier bands is critical for good performance, while the exact frequency divisions and overlap in carrier bands are not as critical.
Temporal and spatio-temporal vibrotactile displays for voice fundamental frequency: An initial evaluation of a new vibrotactile speech perception aid with normal-hearing and hearing-impaired individuals104(1998); http://dx.doi.org/10.1121/1.423909View Description Hide Description
Four experiments were performed to evaluate a new wearable vibrotactile speech perception aid that extracts fundamental frequency and displays the extracted as a single-channel temporal or an eight-channel spatio-temporal stimulus. Specifically, we investigated the perception of intonation (i.e., question versus statement) and emphatic stress (i.e., stress on the first, second, or third word) under Visual-Alone (VA), Visual-Tactile (VT), and Tactile-Alone (TA) conditions and compared performance using the temporal and spatio-temporal vibrotactile display. Subjects were adults with normal hearing in experiments I–III and adults with severe to profound hearing impairments in experiment IV. Both versions of the vibrotactile speech perception aid successfully conveyed intonation. Vibrotactile stress information was successfully conveyed, but vibrotactile stress information did not enhance performance in VT conditions beyond performance in VA conditions. In experiment III, which involved only intonation identification, a reliable advantage for the spatio-temporal display was obtained. Differences between subject groups were obtained for intonation identification, with more accurate VT performance by those with normal hearing. Possible effects of long-term hearing status are discussed.