Skip to main content
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
/content/asa/journal/jasa/128/5/10.1121/1.3497358
1.
1.Bregman, A. S. (1990). Auditory Scene Analysis (MIT, Cambridge, MA).
2.
2.Cooke, M. , Green, P. , Josifovski, L. , and Vizinho, A. (2001). “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Commun. 34, 267285.
http://dx.doi.org/10.1016/S0167-6393(00)00034-0
3.
3.Ephraim, Y. , and Malah, D. (1985). “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process. 33, 443445.
http://dx.doi.org/10.1109/TASSP.1985.1164550
4.
4.Hermansky, H. , Ellis, D. , and Sharma, S. (2000). “Tandem connectionist feature extraction for conventional HMM systems,” in Proceedings of ICASSP, pp. 16351638.
5.
5.Hu, K. , and Wang, D. L. (2009). “Unvoiced speech segregation from nonspeech interference via CASA and spectral subtraction,” Technical Report No. TR51, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH (available online: ftp://ftp.cse.ohio-state.edu/pub/tech-report/2009/TR51.pdf).
6.
6.Karadogan, S. G. , Larsen, J. , Pedersen, M. S. , and Boldt, J. B. (2009). “Robust isolated speech recognition using ideal binary masks,” Technical Report No. 5780, Department of Informatics and Mathematical Modelling, Technical University of Denmark, Kgs. Lyngby, Denmark; available at http://isp.imm.dtu.dk/staff/jlarsen/pubs/frame.htm (Last viewed 10/11/2010).
7.
7.Lecun, Y. , Bottou, L. , Bengio, Y. , and Haffner, P. (1998). “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 22782324.
http://dx.doi.org/10.1109/5.726791
8.
8.Leonard, R. G. (1984). “A database for speaker-independent digit recognition,” in Proceedings of ICASSP, pp. 111114.
9.
9.Simard, P. Y. , Steinkraus, D. , and Platt, J. C. (2003). “Best practices for convolutional neural networks applied to visual document analysis,” in Proceedings of ICDAR, pp. 958963.
10.
10.Srinivasan, S. , and Wang, D. L. (2007). “Transforming binary uncertainties for robust speech recognition,” IEEE Trans. Audio, Speech, Lang. Process. 15, 21302140.
http://dx.doi.org/10.1109/TASL.2007.901836
11.
11.Wang, D. L. , and Brown, G. J. (2006). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, edited by D. L. Wang and G. J. Brown (Wiley/IEEE, Hoboken, NJ).
12.
12.Wang, D. L. , Kjems, U. , Pedersen, M. S. , Boldt, J. B. , and Lunner, T. (2008). “Speech perception of noise with binary gains,” J. Acoust. Soc. Am. 124, 23032307.
http://dx.doi.org/10.1121/1.2967865
13.
13.Wang, D. L. , Kjems, U. , Pedersen, M. S. , Boldt, J. B. , and Lunner, T. (2009). “Speech intelligibility in background noise with ideal binary time-frequency masking,” J. Acoust. Soc. Am. 125, 23362347.
http://dx.doi.org/10.1121/1.3083233
14.
14.Young, S. , Kershaw, D. , Odell, J. , Valtchev, V. , and Woodland, P. (2009). The HTK Book (for HTK Version 3.4) (Microsoft Corp., Redmond, WA).
http://aip.metastore.ingenta.com/content/asa/journal/jasa/128/5/10.1121/1.3497358
Loading
/content/asa/journal/jasa/128/5/10.1121/1.3497358
Loading

Data & Media loading...

Loading

Article metrics loading...

/content/asa/journal/jasa/128/5/10.1121/1.3497358
2010-10-19
2016-08-29

Abstract

Inspired by recent evidence that a binary pattern may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed by classifying binary masks corresponding to a word utterance. The proposed method is evaluated using a subset of the TIDigits corpus to perform isolated digit recognition. Despite dramatic reduction of speech information encoded in a binary mask, the proposed system performs surprisingly well. The system is compared with a traditional HMM based approach and is shown to perform well under low SNR conditions.

Loading

Full text loading...

/deliver/fulltext/asa/journal/jasa/128/5/1.3497358.html;jsessionid=qF5VvHQ1YPBO9713av3xqHyb.x-aip-live-03?itemId=/content/asa/journal/jasa/128/5/10.1121/1.3497358&mimeType=html&fmt=ahah&containerItemId=content/asa/journal/jasa
true
true

Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
/content/realmedia?fmt=ahah&adPositionList=
&advertTargetUrl=//oascentral.aip.org/RealMedia/ads/&sitePageValue=asadl.org/jasa/128/5/10.1121/1.3497358&pageURL=http://scitation.aip.org/content/asa/journal/jasa/128/5/10.1121/1.3497358'
Right1,Right2,Right3,