1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environmentsa)
a)Portions of this work were presented in “Suppressing steady-state portions of speech for improving intelligibility in various reverberant environments,” Proc. China-Japan Joint Conference on Acoustics, Nanjing, November 2002, and “Improving speech intelligibility by steady-state suppression as pre-processing in small to medium sized halls,” Proceedings of Eurospeech, Geneva, September 2003.
Rent:
Rent this article for
USD
10.1121/1.2198191
/content/asa/journal/jasa/119/6/10.1121/1.2198191
http://aip.metastore.ingenta.com/content/asa/journal/jasa/119/6/10.1121/1.2198191

Figures

Image of FIG. 1.
FIG. 1.

An illustration of overlap-masking on the segments “tewaka.” (a) The waveforms on the left are the original signals. The left bottom panel shows the original whole utterance, while the top left panel shows the waveforms for the phonemes /t/, /e/, /w/, /a/, /k/, and /a/. (b) The reverberant signals shown on the right correspond to the original signals on the left. The reverberant signals are obtained by taking the convolution of the original signals with the impulse response of the room with an RT of . This figure shows that when a previous segment has strong energy, as in the case of a vowel, the maskee of the reverberant following segment is smeared to a much greater degree by the masker of the reverberant previous segment.

Image of FIG. 2.
FIG. 2.

Block diagram of the steady-state suppression. The top panel shows the general flow of signals. The steady-state suppression defines a speech portion as steady state when is less than a certain threshold. Once a portion is considered to be steady state, the amplitude of the portion is multiplied by the factor 0.4. The bottom panel shows the computation. First, the original signal is split into octave bands. In each band, the envelope is extracted and then smoothed by a low-pass filter. After down-sampling , the regression coefficients, delta, are calculated from the five adjacent samples of the time trajectory of the logarithmic envelope of each subband. The mean square of the regression coefficients over all bands, , is then calculated. Finally, the time trajectory of is up-sampled to the original sampling rate.

Image of FIG. 3.
FIG. 3.

Mean percent correct for identification of 24 CV syllables by 22 subjects with or without the steady-state suppression in experiments I and II. Open circles represent the mean performance of processed signals with reverberation and closed circles represent the mean performance of unprocessed signals with reverberation for five RTs (0.9, 1.0, 1.1, 1.2, and ) in experiment I. Open squares indicate the mean performance of processed signals and closed squares indicate the mean performance of unprocessed signals for five RTs (0.4, 0.5, 0.7, 0.9, and ) and without reverberation in experiment II. The dashed line separates the results for the signals with reverberation (right) from those without reverberation (left). Lines drawn through the data pass through the mean performance for two reverberant conditions (RTs of 0.9 and 1.0) repeated in both experiments. The shaded region shows the RTs where significant differences were found.

Image of FIG. 4.
FIG. 4.

Average modulation spectra of 24 speech sentences used in the experiments with and without steady-state suppression [solid line: without processing , dashed line: with processing ] for each frequency band. Frequency regions are constituted with band 1 , band 2 , band 3 , and band 4 .

Image of FIG. 5.
FIG. 5.

Average modulation spectra of 24 speech sentences used in the experiments with and without steady-state suppression after reverberation (RT of ) [solid line: without processing , dashed line: with processing ] for each frequency band. Frequency regions are constituted with band 1 , band 2 , band 3 , and band 4 .

Tables

Generic image for table
TABLE I.

Reverberant conditions used in the experiments. The impulse responses h1–h5, h7, and h8 were obtained by multiplying the exponential decay by the original impulse response h6 to achieve the desired reverberant conditions. The RT values are the average RTs derived from early decay time (EDT) at the center frequencies of 0.5, 1, and of the 1-oct bandpassed impulse response.

Generic image for table
TABLE II.

Twenty-four nonsense consonant-vowel syllables (CVs) used in the experiments.

Generic image for table
TABLE III.

Types of errors and changes in confusions by the steady-state suppression, classified according to manner and place of articulation feature, as well as the overall results, for both processing conditions in experiment I.

Generic image for table
TABLE IV.

Types of errors and changes in confusions by the steady-state suppression, classified according to manner and place of articulation feature, as well as the overall results, for both processing conditions in experiment II.

Loading

Article metrics loading...

/content/asa/journal/jasa/119/6/10.1121/1.2198191
2006-06-01
2014-04-17
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environmentsa)
http://aip.metastore.ingenta.com/content/asa/journal/jasa/119/6/10.1121/1.2198191
10.1121/1.2198191
SEARCH_EXPAND_ITEM