Skip to main content

News about Scitation

In December 2016 Scitation will launch with a new design, enhanced navigation and a much improved user experience.

To ensure a smooth transition, from today, we are temporarily stopping new account registration and single article purchases. If you already have an account you can continue to use the site as normal.

For help or more information please visit our FAQs.

banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
1. Athineos, M. , and Ellis, D. P. W. (2007). “ Autoregressive modelling of temporal envelopes,” IEEE Trans. Signal Proc. 55, 52375245.
2. Chen, C. , and Bilmes, J. A. (2007). “ MVA processing of speech features,” IEEE Trans. Audio Speech Lang. Process. 15(1), 257270.
3. Chi, T. , Ru, P. , and Shamma, S. A. (2005). “ Multiresolution spectrotemporal analysis of complex sounds,” J. Acoust. Soc. Am. 118(2), 887906.
4. Davis, S. , and Mermelstein, P. (1980). “ Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust. Speech Signal Proc. 28, 357366.
5. Dehak, N. , Kenny, P. J. , Dehak, R. , Dumouchel, P. , and Ouellet, P. (2011). “ Front-end factor analysis for speaker verification,” IEEE Trans. Audio Speech Lang. Process. 19(4), 788798.
6. Drullman, R. , Festen, J. M. , and Plomp, R. (1994). “ Effect of temporal envelope smearing on speech reception,” J. Acoust. Soc. Am. 95(2), 10531064.
7. Elliott, T. M. , and Theunissen, F. E. (2009). “ The modulation transfer function for speech intelligibility,” PLoS Comput. Biol. 5(3), e1000302.
8.ETSI (2002). “ ETSI ES 202 050 v1.1.1 STQ; Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms,”
9. Ganapathy, S. , Mallidi, S. H. , and Hermansky, H. (2014). “ Robust feature extraction using modulation filtering of autoregressive models,” IEEE Trans. Audio Speech Lang. Process. 22(8), 12851295.
10. Greenberg, S. , Ainsworth, W. A. , Popper, A. N. , and Fay, R. R. (2004). Speech Processing in the Auditory System ( Springer, New York), Vol. 18, Chap. 1, pp. 1720.
11. Hermansky, H. , and Morgan, N. (1994). “ RASTA processing of speech,” IEEE Trans. Speech Audio Proc. 2(4), 578589.
12. Keurs, T. M. , Festen, J. M. , and Plomp, R. (1992). “ Effect of spectral envelope smearing on speech reception. I,” J. Acoust. Soc. Am. 91(5), 28722880.
13. Kim, C. , and Stern, R. M. (2012). “ Power-normalized cepstral coefficients (PNCC) for robust speech recognition,” in Proceedings of Int. Conf. on Acoust. Speech and Signal Proc. (IEEE), pp. 41014104.
14. Makhoul, J. (1975). “ Linear prediction: A tutorial review,” Proc. IEEE 63, 561580.
15. Nemala, S. K. , Patil, K. , and Elhilali, M. (2013). “ A multistream feature framework based on bandpass modulation filtering for robust speech recognition,” IEEE Trans. Audio Speech Lang. Proc. 21(2), 416426.
16. Palmer, A. , and Shamma, S. (2004). Physiological Representations of Speech: Speech Processing in the Auditory System ( Springer, New York), Chap. 4, pp. 163230.
17. Pelecanos, J. , and Sridharan, S. (2001). “ Feature warping for robust speaker verification,” in Proc. IEEE Odyssey Speaker Lang. Recognition Workshop (IEEE), pp. 213218.
18. Povey, D. , Ghoshal, A. , Boulianne, G. , Burget, L. , Glembek, O. , Goel, N. , Hannemann, M. , Motlicek, P. , Qian, Y. , Schwarz, P. , Silovsk, J. , Stemmer, G. , and Vesel, K. (2011). “ The Kaldi speech recognition toolkit,” in IEEE Automatic Speech Recog. and Understanding (IEEE), 14.
19. Walker, K. , and Strassel, S. (2012). “ The RATS radio traffic collection system,” in Proc. IEEE Odyssey Speaker Lang. Recog. Workshop (IEEE).

Data & Media loading...


Article metrics loading...



The robustness of the human auditory system to noise is partly due to the peak preserving capability of the periphery and the cortical filtering of spectro-temporal modulations. In this letter, a robust speech feature extraction scheme is developed that emulates this processing by deriving a spectrographic representation that emphasizes the high energy regions. This is followed by a modulation filtering step to preserve only the important spectro-temporal modulations. The features derived from this representation provide significant improvements for speech recognition in noise and language identification in radio channel speech. Further, the experimental analysis shows congruence with human psychophysical studies.


Full text loading...


Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd