Skip to main content

News about Scitation

In December 2016 Scitation will launch with a new design, enhanced navigation and a much improved user experience.

To ensure a smooth transition, from today, we are temporarily stopping new account registration and single article purchases. If you already have an account you can continue to use the site as normal.

For help or more information please visit our FAQs.

banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
1.Athineos, M. , and Ellis, D. P. W. (2007). “Autoregressive modelling of temporal envelopes,” IEEE Trans. Signal Process. 55(11), 52375245.
2.Athineos, M. , Hermansky, H. , and Ellis, D. P. W. (2004). “LP-TRAPS: Linear predictive temporal patterns,” Proceedings of INTERSPEECH, pp. 11541157.
3.Bourlard, H. , and Morgan, N. (1994). Connectionist Speech Recognition—A Hybrid approach (Kluwer Academic, Dordrecht).
4.Dau, T. , Püschel, D. , and Kohlrausch, A. (1996). “A quantitative model of the “effective” signal processing in the auditory system: I. Model structure,” J. Acoust. Soc. Am. 99(6), 36153622.
5.ETSI (2002). “ETSI ES 202 050 v1.1.1 STQ; Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms.”
6.Hermansky, H. (1990). “Perceptual linear predictive (PLP) analysis of speech,” J. Acoust. Soc. Am. 87(4), 17381752.
7.Hermansky, H. , and Fousek, P. (2005). “Multi-resolution RASTA filtering for TANDEM-based ASR,” Proceedings of INTERSPEECH, pp. 361364.
8.Hermansky, H. , and Morgan, N. (1994). “RASTA processing of speech,” IEEE Trans. Speech Audio Process. 2, 578589.
9.Pinto, J. , Yegnanarayana, B. , Hermansky, H. , and Doss, M. M. (2007). “Exploiting contextual information for improved phoneme recognition,” Proceedings of INTERSPEECH, pp. 18171820.
10.Reynolds, D. A. (1997). “HTIMIT and LLHDB: speech corpora for the study of hand set transducer effects,” Proceedings of ICASSP, pp. 15351538.
11.Tchorz, J. , and Kollmeier, B. (1999). “A model of auditory perception as front end for automatic speech recognition,” J. Acoust. Soc. Am. 106(4), 20402050.

Data & Media loading...


Article metrics loading...



In this letter, a new feature extraction technique based on modulation spectrum derived from syllable-length segments of subband temporal envelopes is proposed. These subband envelopes are derived from autoregressive modeling of Hilbert envelopes of the signal in critical bands, processed by both a static (logarithmic) and a dynamic (adaptive loops) compression. These features are then used for machine recognition of phonemes in telephonespeech. Without degrading the performance in clean conditions, the proposed features show significant improvements compared to other state-of-the-art speech analysis techniques. In addition to the overall phoneme recognition rates, the performance with broad phonetic classes is reported.


Full text loading...


Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd