No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering
3. Chi, T. , Ru, P. , and Shamma, S. A. (2005). “ Multiresolution spectrotemporal analysis of complex sounds,” J. Acoust. Soc. Am. 118(2), 887–906.
4. Davis, S. , and Mermelstein, P. (1980). “ Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust. Speech Signal Proc. 28, 357–366.
5. Dehak, N. , Kenny, P. J. , Dehak, R. , Dumouchel, P. , and Ouellet, P. (2011). “ Front-end factor analysis for speaker verification,” IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798.
6. Drullman, R. , Festen, J. M. , and Plomp, R. (1994). “ Effect of temporal envelope smearing on speech reception,” J. Acoust. Soc. Am. 95(2), 1053–1064.
9. Ganapathy, S. , Mallidi, S. H. , and Hermansky, H. (2014). “ Robust feature extraction using modulation filtering of autoregressive models,” IEEE Trans. Audio Speech Lang. Process. 22(8), 1285–1295.
10. Greenberg, S. , Ainsworth, W. A. , Popper, A. N. , and Fay, R. R. (2004). Speech Processing in the Auditory System ( Springer, New York), Vol. 18, Chap. 1, pp. 17–20.
12. Keurs, T. M. , Festen, J. M. , and Plomp, R. (1992). “ Effect of spectral envelope smearing on speech reception. I,” J. Acoust. Soc. Am. 91(5), 2872–2880.
13. Kim, C. , and Stern, R. M. (2012). “ Power-normalized cepstral coefficients (PNCC) for robust speech recognition,” in Proceedings of Int. Conf. on Acoust. Speech and Signal Proc. (IEEE), pp. 4101–4104.
15. Nemala, S. K. , Patil, K. , and Elhilali, M. (2013). “ A multistream feature framework based on bandpass modulation filtering for robust speech recognition,” IEEE Trans. Audio Speech Lang. Proc. 21(2), 416–426.
16. Palmer, A. , and Shamma, S. (2004). Physiological Representations of Speech: Speech Processing in the Auditory System ( Springer, New York), Chap. 4, pp. 163–230.
17. Pelecanos, J. , and Sridharan, S. (2001). “ Feature warping for robust speaker verification,” in Proc. IEEE Odyssey Speaker Lang. Recognition Workshop (IEEE), pp. 213–218.
18. Povey, D. , Ghoshal, A. , Boulianne, G. , Burget, L. , Glembek, O. , Goel, N. , Hannemann, M. , Motlicek, P. , Qian, Y. , Schwarz, P. , Silovsk, J. , Stemmer, G. , and Vesel, K. (2011). “ The Kaldi speech recognition toolkit,” in IEEE Automatic Speech Recog. and Understanding (IEEE), 1–4.
19. Walker, K. , and Strassel, S. (2012). “ The RATS radio traffic collection system,” in Proc. IEEE Odyssey Speaker Lang. Recog. Workshop (IEEE).
Article metrics loading...
The robustness of the human auditory system to noise is partly due to the peak preserving capability of the periphery and the cortical filtering of spectro-temporal modulations. In this letter, a robust speech feature extraction scheme is developed that emulates this processing by deriving a spectrographic representation that emphasizes the high energy regions. This is followed by a modulation filtering step to preserve only the important spectro-temporal modulations. The features derived from this representation provide significant improvements for speech recognition in noise and language identification in radio channel speech. Further, the experimental analysis shows congruence with human psychophysical studies.
Full text loading...
Most read this month