No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
oa
Robust speech recognition from binary masks
Abstract
Inspired by recent evidence that a binary pattern may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed by classifying binary masks corresponding to a word utterance. The proposed method is evaluated using a subset of the TIDigits corpus to perform isolated digit recognition. Despite dramatic reduction of speech information encoded in a binary mask, the proposed system performs surprisingly well. The system is compared with a traditional HMM based approach and is shown to perform well under low SNR conditions.
© 2010 Acoustical Society of America
Received 16 August 2010
Accepted 15 September 2010
Published online 19 October 2010
Acknowledgments:
The research described in this paper was supported in part by an AFOSR grant (FA9550-08-1-0155) and an NSF grant (IIS-0534707). We acknowledge a similar independent work described by Karadogan et al. (2009), which we became aware of after our model had been developed.
Article outline:
I. Introduction
II. System description
III. Results
A. Experimental setup
B. Evaluation results
IV. Concluding remarks
/content/asa/journal/jasa/128/5/10.1121/1.3497358
1.
1.Bregman, A. S. (1990). Auditory Scene Analysis (MIT, Cambridge, MA).
3.
3.Ephraim, Y. , and Malah, D. (1985). “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process. 33, 443–445.
http://dx.doi.org/10.1109/TASSP.1985.1164550
4.
4.Hermansky, H. , Ellis, D. , and Sharma, S. (2000). “Tandem connectionist feature extraction for conventional HMM systems,” in Proceedings of ICASSP, pp. 1635–1638.
5.
5.Hu,
K. , and
Wang,
D. L. (
2009). “
Unvoiced speech segregation from nonspeech interference via CASA and spectral subtraction,” Technical Report No. TR51, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH (available online:
ftp://ftp.cse.ohio-state.edu/pub/tech-report/2009/TR51.pdf).
6.
6.Karadogan,
S. G. ,
Larsen,
J. ,
Pedersen,
M. S. , and
Boldt,
J. B. (
2009). “
Robust isolated speech recognition using ideal binary masks,” Technical Report No. 5780, Department of Informatics and Mathematical Modelling,
Technical University of Denmark, Kgs. Lyngby, Denmark; available at
http://isp.imm.dtu.dk/staff/jlarsen/pubs/frame.htm (Last viewed 10/11/2010).
7.
7.Lecun, Y. , Bottou, L. , Bengio, Y. , and Haffner, P. (1998). “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2324.
http://dx.doi.org/10.1109/5.726791
8.
8.Leonard, R. G. (1984). “A database for speaker-independent digit recognition,” in Proceedings of ICASSP, pp. 111–114.
9.
9.Simard, P. Y. , Steinkraus, D. , and Platt, J. C. (2003). “Best practices for convolutional neural networks applied to visual document analysis,” in Proceedings of ICDAR, pp. 958–963.
10.
10.Srinivasan, S. , and Wang, D. L. (2007). “Transforming binary uncertainties for robust speech recognition,” IEEE Trans. Audio, Speech, Lang. Process. 15, 2130–2140.
http://dx.doi.org/10.1109/TASL.2007.901836
11.
11.Wang, D. L. , and Brown, G. J. (2006). Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, edited by D. L. Wang and G. J. Brown (Wiley/IEEE, Hoboken, NJ).
12.
12.Wang, D. L. , Kjems, U. , Pedersen, M. S. , Boldt, J. B. , and Lunner, T. (2008). “Speech perception of noise with binary gains,” J. Acoust. Soc. Am. 124, 2303–2307.
http://dx.doi.org/10.1121/1.2967865
13.
13.Wang, D. L. , Kjems, U. , Pedersen, M. S. , Boldt, J. B. , and Lunner, T. (2009). “Speech intelligibility in background noise with ideal binary time-frequency masking,” J. Acoust. Soc. Am. 125, 2336–2347.
http://dx.doi.org/10.1121/1.3083233
14.
14.Young, S. , Kershaw, D. , Odell, J. , Valtchev, V. , and Woodland, P. (2009). The HTK Book (for HTK Version 3.4) (Microsoft Corp., Redmond, WA).
http://aip.metastore.ingenta.com/content/asa/journal/jasa/128/5/10.1121/1.3497358
Article metrics loading...
/content/asa/journal/jasa/128/5/10.1121/1.3497358
2010-10-19
2016-08-29
Abstract
Inspired by recent evidence that a binary pattern may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed by classifying binary masks corresponding to a word utterance. The proposed method is evaluated using a subset of the TIDigits corpus to perform isolated digit recognition. Despite dramatic reduction of speech information encoded in a binary mask, the proposed system performs surprisingly well. The system is compared with a traditional HMM based approach and is shown to perform well under low SNR conditions.
Full text loading...
/deliver/fulltext/asa/journal/jasa/128/5/1.3497358.html;jsessionid=qF5VvHQ1YPBO9713av3xqHyb.x-aip-live-03?itemId=/content/asa/journal/jasa/128/5/10.1121/1.3497358&mimeType=html&fmt=ahah&containerItemId=content/asa/journal/jasa
Most read this month
Article
content/asa/journal/jasa
Journal
5
3
true
true
Commenting has been disabled for this content