No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
Modulation frequency features for phoneme recognition in noisy speech
1.Athineos, M. , and Ellis, D. P. W. (2007). “Autoregressive modelling of temporal envelopes,” IEEE Trans. Signal Process. 55(11), 5237–5245.
2.Athineos, M. , Hermansky, H. , and Ellis, D. P. W. (2004). “LP-TRAPS: Linear predictive temporal patterns,” Proceedings of INTERSPEECH, pp. 1154–1157.
3.Bourlard, H. , and Morgan, N. (1994). Connectionist Speech Recognition—A Hybrid approach (Kluwer Academic, Dordrecht).
4.Dau, T. , Püschel, D. , and Kohlrausch, A. (1996). “A quantitative model of the “effective” signal processing in the auditory system: I. Model structure,” J. Acoust. Soc. Am. 99(6), 3615–3622.
5.ETSI (2002). “ETSI ES 202 050 v1.1.1 STQ; Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms.”
7.Hermansky, H. , and Fousek, P. (2005). “Multi-resolution RASTA filtering for TANDEM-based ASR,” Proceedings of INTERSPEECH, pp. 361–364.
9.Pinto, J. , Yegnanarayana, B. , Hermansky, H. , and Doss, M. M. (2007). “Exploiting contextual information for improved phoneme recognition,” Proceedings of INTERSPEECH, pp. 1817–1820.
10.Reynolds, D. A. (1997). “HTIMIT and LLHDB: speech corpora for the study of hand set transducer effects,” Proceedings of ICASSP, pp. 1535–1538.
11.Tchorz, J. , and Kollmeier, B. (1999). “A model of auditory perception as front end for automatic speech recognition,” J. Acoust. Soc. Am. 106(4), 2040–2050.
Article metrics loading...
In this letter, a new feature extraction technique based on modulation spectrum derived from syllable-length segments of subband temporal envelopes is proposed. These subband envelopes are derived from autoregressive modeling of Hilbert envelopes of the signal in critical bands, processed by both a static (logarithmic) and a dynamic (adaptive loops) compression. These features are then used for machine recognition of phonemes in telephonespeech. Without degrading the performance in clean conditions, the proposed features show significant improvements compared to other state-of-the-art speech analysis techniques. In addition to the overall phoneme recognition rates, the performance with broad phonetic classes is reported.
Full text loading...
Most read this month