1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Temporal envelope compensation for robust phoneme recognition using modulation spectrum
Rent:
Rent this article for
USD
10.1121/1.3504658
/content/asa/journal/jasa/128/6/10.1121/1.3504658
http://aip.metastore.ingenta.com/content/asa/journal/jasa/128/6/10.1121/1.3504658

Figures

Image of FIG. 1.
FIG. 1.

Block schematic for the FDLP. The steps involved are application of DCT, estimation of spectral autocorrelations, and linear prediction to estimate the AR model of Hilbert envelope.

Image of FIG. 2.
FIG. 2.

(Color online) Illustration of the all-pole modeling property of FDLP. (a) A portion of the sub-band speech signal (with frequency range of 500–700 Hz), (b) its Hilbert envelope, and (c) all-pole model obtained using FDLP with model order of 80.

Image of FIG. 3.
FIG. 3.

(Color online) Log-FDLP envelopes for a sub-band of clean speech and speech corrupted with babble noise at 10 dB SNR. (a) Without noise compensation and (b) with noise compensation.

Image of FIG. 4.
FIG. 4.

Noise compensation in FDLP.

Image of FIG. 5.
FIG. 5.

(Color online) Log-FDLP envelopes for a sub-band of clean speech and telephone speech. (a) Without gain normalization and noise compensation, and (b) with gain normalization and noise compensation.

Image of FIG. 6.
FIG. 6.

Block schematic for the sub-band feature extraction. The steps involved are critical band decomposition, estimation of sub-band envelopes using FDLP, static and adaptive compression, and conversion to modulation frequency components by the application of cosine transform.

Image of FIG. 7.
FIG. 7.

Dynamic compression scheme using adaptive compression loops.

Image of FIG. 8.
FIG. 8.

(Color online) Static and dynamic compression of the FDLP envelopes. (a) A portion of sub-band FDLP envelope, (b) logarithmic compression of the FDLP envelope, and (c) adaptive compression of the FDLP envelope.

Tables

Generic image for table
TABLE I.

Recognition accuracies (%) of individual phonemes for different feature extraction techniques on clean speech, speech with additive noise (average performance of four noise types at 0, 5, 10, 15, and 20 dB SNRs), reverberant speech (average performance for nine room impulse response functions), and telephone speech (average performance for nine channel conditions). The best performance for each condition is indicated in bold.

Generic image for table
TABLE II.

Phoneme recognition accuracies (%) for different feature extraction techniques for four noise types (“Restaurant,” “Babble,” “Subway,” and “Exhibition Hall”) at 0, 5, 10, 15, and 20 dB SNRs. The best performance for each condition is indicated in bold.

Generic image for table
TABLE III.

Phoneme recognition accuracies (%) for different feature extraction techniques on CTS database. The best performance is indicated in bold.

Generic image for table
TABLE IV.

Various modifications to the proposed feature extraction and their meanings.

Generic image for table
TABLE V.

Phoneme recognition accuracies (%) for various modifications to the proposed feature extraction in clean speech, with one condition of additive noise (Babble noise at 10 dB SNR), reverberant speech (with a reverberation time of 300 ms), and one condition of telephone channel speech. The phoneme recognition results without any modification to the proposed technique are shown at the bottom.

Loading

Article metrics loading...

/content/asa/journal/jasa/128/6/10.1121/1.3504658
2010-12-01
2014-04-20
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Temporal envelope compensation for robust phoneme recognition using modulation spectrum
http://aip.metastore.ingenta.com/content/asa/journal/jasa/128/6/10.1121/1.3504658
10.1121/1.3504658
SEARCH_EXPAND_ITEM