1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
oa
Modulation frequency features for phoneme recognition in noisy speech
Rent:
Rent this article for
Access full text Article
/content/asa/journal/jasa/125/1/10.1121/1.3040022
1.
1.Athineos, M. , and Ellis, D. P. W. (2007). “Autoregressive modelling of temporal envelopes,” IEEE Trans. Signal Process. 55(11), 52375245.
2.
2.Athineos, M. , Hermansky, H. , and Ellis, D. P. W. (2004). “LP-TRAPS: Linear predictive temporal patterns,” Proceedings of INTERSPEECH, pp. 11541157.
3.
3.Bourlard, H. , and Morgan, N. (1994). Connectionist Speech Recognition—A Hybrid approach (Kluwer Academic, Dordrecht).
4.
4.Dau, T. , Püschel, D. , and Kohlrausch, A. (1996). “A quantitative model of the “effective” signal processing in the auditory system: I. Model structure,” J. Acoust. Soc. Am. 99(6), 36153622.
http://dx.doi.org/10.1121/1.414959
5.
5.ETSI (2002). “ETSI ES 202 050 v1.1.1 STQ; Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms.”
6.
6.Hermansky, H. (1990). “Perceptual linear predictive (PLP) analysis of speech,” J. Acoust. Soc. Am. 87(4), 17381752.
http://dx.doi.org/10.1121/1.399423
7.
7.Hermansky, H. , and Fousek, P. (2005). “Multi-resolution RASTA filtering for TANDEM-based ASR,” Proceedings of INTERSPEECH, pp. 361364.
8.
8.Hermansky, H. , and Morgan, N. (1994). “RASTA processing of speech,” IEEE Trans. Speech Audio Process. 2, 578589.
http://dx.doi.org/10.1109/89.326616
9.
9.Pinto, J. , Yegnanarayana, B. , Hermansky, H. , and Doss, M. M. (2007). “Exploiting contextual information for improved phoneme recognition,” Proceedings of INTERSPEECH, pp. 18171820.
10.
10.Reynolds, D. A. (1997). “HTIMIT and LLHDB: speech corpora for the study of hand set transducer effects,” Proceedings of ICASSP, pp. 15351538.
11.
11.Tchorz, J. , and Kollmeier, B. (1999). “A model of auditory perception as front end for automatic speech recognition,” J. Acoust. Soc. Am. 106(4), 20402050.
http://dx.doi.org/10.1121/1.427950
http://aip.metastore.ingenta.com/content/asa/journal/jasa/125/1/10.1121/1.3040022
Loading
/content/asa/journal/jasa/125/1/10.1121/1.3040022
Loading

Data & Media loading...

Loading

Article metrics loading...

/content/asa/journal/jasa/125/1/10.1121/1.3040022
2008-12-22
2014-11-27

Abstract

In this letter, a new feature extraction technique based on modulation spectrum derived from syllable-length segments of subband temporal envelopes is proposed. These subband envelopes are derived from autoregressive modeling of Hilbert envelopes of the signal in critical bands, processed by both a static (logarithmic) and a dynamic (adaptive loops) compression. These features are then used for machine recognition of phonemes in telephonespeech. Without degrading the performance in clean conditions, the proposed features show significant improvements compared to other state-of-the-art speech analysis techniques. In addition to the overall phoneme recognition rates, the performance with broad phonetic classes is reported.

Loading

Full text loading...

/deliver/fulltext/asa/journal/jasa/125/1/1.3040022.html;jsessionid=1hayvb6pj3lcp.x-aip-live-02?itemId=/content/asa/journal/jasa/125/1/10.1121/1.3040022&mimeType=html&fmt=ahah&containerItemId=content/asa/journal/jasa
true
true
This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Modulation frequency features for phoneme recognition in noisy speech
http://aip.metastore.ingenta.com/content/asa/journal/jasa/125/1/10.1121/1.3040022
10.1121/1.3040022
SEARCH_EXPAND_ITEM