1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
oa
The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
Rent:
Rent this article for
Access full text Article
/content/asa/journal/jasa/134/5/10.1121/1.4824632
1.
1. G. N. Hu and D. L. Wang, “Speech segregation based on pitch tracking and amplitude modulation,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2001).
2.
2. G. N. Hu and D. L. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Networks 15(5), 11351150 (2004).
http://dx.doi.org/10.1109/TNN.2004.832812
3.
3. Y. Hu and P. C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE. Trans. Speech Audio Process. 12(1), 5967 (2004).
http://dx.doi.org/10.1109/TSA.2003.819949
4.
4. S. Rangachari and P. C. Loizou, “A noise-estimation algorithm for highly non-stationary environments,” Speech Commun. 48, 220231 (2006).
http://dx.doi.org/10.1016/j.specom.2005.08.005
5.
5. O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Process. 52(7), 18301846 (2004).
http://dx.doi.org/10.1109/TSP.2004.828896
6.
6. H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Trans. Audio, Speech, Lang. Process. 19(3), 516527 (2011).
http://dx.doi.org/10.1109/TASL.2010.2051355
7.
7. R. D. Patterson, J. Holdsworth, I. Nimmo-Smith, and P. Rice, “An efficient auditory filterbank based on the gammatone function,” Report No. 2341, MRC Applied Psychology Unit (1988).
8.
8. M. Weintraub, “A theory and computational model of auditory monaural sound separation,” Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA, 1985.
9.
9. S. Srinivasan, N. Roman, and D. L. Wang, “Binary and ratio time-frequency masks for robust speech recognition,” Speech Commun. 48, 14861501 (2006).
http://dx.doi.org/10.1016/j.specom.2006.09.003
10.
10. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series (MIT Press, Cambridge, MA, 1949).
11.
11. S. Mallat, A Wavelet Tour of Signal Processing (Academic, New York, 1998), Chap. 4.
12.
12. Y. P. Li and D. L. Wang, “On the optimality of ideal binary time-frequency masks,” Speech Commun. 51, 230239 (2009).
http://dx.doi.org/10.1016/j.specom.2008.09.001
13.
13. M. P. Cooke, J. Barker, S. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” J. Acoust. Soc. Am. 120(5), 24212424 (2006).
http://dx.doi.org/10.1121/1.2229005
15.
15. T. Melia, “Underdetermined blind source separation in echoic environments using linear arrays and sparse representations,” Ph.D dissertation, University College Dublin, National University of Ireland, 2007.
16.
16. Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process. 16(1), 229238 (2008).
http://dx.doi.org/10.1109/TASL.2007.911054
http://aip.metastore.ingenta.com/content/asa/journal/jasa/134/5/10.1121/1.4824632
Loading
/content/asa/journal/jasa/134/5/10.1121/1.4824632
Loading

Data & Media loading...

Loading

Article metrics loading...

/content/asa/journal/jasa/134/5/10.1121/1.4824632
2013-10-16
2014-09-17

Abstract

In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.

Loading

Full text loading...

/deliver/fulltext/asa/journal/jasa/134/5/1.4824632.html;jsessionid=pvrm8ieohsgp.x-aip-live-02?itemId=/content/asa/journal/jasa/134/5/10.1121/1.4824632&mimeType=html&fmt=ahah&containerItemId=content/asa/journal/jasa
true
true
This is a required field
Please enter a valid email address
This feature is disabled while Scitation upgrades its access control system.
This feature is disabled while Scitation upgrades its access control system.
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
http://aip.metastore.ingenta.com/content/asa/journal/jasa/134/5/10.1121/1.4824632
10.1121/1.4824632
SEARCH_EXPAND_ITEM