An illustration of overlap-masking on the segments “tewaka.” (a) The waveforms on the left are the original signals. The left bottom panel shows the original whole utterance, while the top left panel shows the waveforms for the phonemes /t/, /e/, /w/, /a/, /k/, and /a/. (b) The reverberant signals shown on the right correspond to the original signals on the left. The reverberant signals are obtained by taking the convolution of the original signals with the impulse response of the room with an RT of . This figure shows that when a previous segment has strong energy, as in the case of a vowel, the maskee of the reverberant following segment is smeared to a much greater degree by the masker of the reverberant previous segment.
Block diagram of the steady-state suppression. The top panel shows the general flow of signals. The steady-state suppression defines a speech portion as steady state when is less than a certain threshold. Once a portion is considered to be steady state, the amplitude of the portion is multiplied by the factor 0.4. The bottom panel shows the computation. First, the original signal is split into octave bands. In each band, the envelope is extracted and then smoothed by a low-pass filter. After down-sampling , the regression coefficients, delta, are calculated from the five adjacent samples of the time trajectory of the logarithmic envelope of each subband. The mean square of the regression coefficients over all bands, , is then calculated. Finally, the time trajectory of is up-sampled to the original sampling rate.
Mean percent correct for identification of 24 CV syllables by 22 subjects with or without the steady-state suppression in experiments I and II. Open circles represent the mean performance of processed signals with reverberation and closed circles represent the mean performance of unprocessed signals with reverberation for five RTs (0.9, 1.0, 1.1, 1.2, and ) in experiment I. Open squares indicate the mean performance of processed signals and closed squares indicate the mean performance of unprocessed signals for five RTs (0.4, 0.5, 0.7, 0.9, and ) and without reverberation in experiment II. The dashed line separates the results for the signals with reverberation (right) from those without reverberation (left). Lines drawn through the data pass through the mean performance for two reverberant conditions (RTs of 0.9 and 1.0) repeated in both experiments. The shaded region shows the RTs where significant differences were found.
Average modulation spectra of 24 speech sentences used in the experiments with and without steady-state suppression [solid line: without processing , dashed line: with processing ] for each frequency band. Frequency regions are constituted with band 1 , band 2 , band 3 , and band 4 .
Average modulation spectra of 24 speech sentences used in the experiments with and without steady-state suppression after reverberation (RT of ) [solid line: without processing , dashed line: with processing ] for each frequency band. Frequency regions are constituted with band 1 , band 2 , band 3 , and band 4 .
Reverberant conditions used in the experiments. The impulse responses h1–h5, h7, and h8 were obtained by multiplying the exponential decay by the original impulse response h6 to achieve the desired reverberant conditions. The RT values are the average RTs derived from early decay time (EDT) at the center frequencies of 0.5, 1, and of the 1-oct bandpassed impulse response.
Twenty-four nonsense consonant-vowel syllables (CVs) used in the experiments.
Types of errors and changes in confusions by the steady-state suppression, classified according to manner and place of articulation feature, as well as the overall results, for both processing conditions in experiment I.
Types of errors and changes in confusions by the steady-state suppression, classified according to manner and place of articulation feature, as well as the overall results, for both processing conditions in experiment II.
Article metrics loading...
Full text loading...