Skip to main content
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
1. G. N. Hu and D. L. Wang, “Speech segregation based on pitch tracking and amplitude modulation,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2001).
2. G. N. Hu and D. L. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Networks 15(5), 11351150 (2004).
3. Y. Hu and P. C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE. Trans. Speech Audio Process. 12(1), 5967 (2004).
4. S. Rangachari and P. C. Loizou, “A noise-estimation algorithm for highly non-stationary environments,” Speech Commun. 48, 220231 (2006).
5. O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Process. 52(7), 18301846 (2004).
6. H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Trans. Audio, Speech, Lang. Process. 19(3), 516527 (2011).
7. R. D. Patterson, J. Holdsworth, I. Nimmo-Smith, and P. Rice, “An efficient auditory filterbank based on the gammatone function,” Report No. 2341, MRC Applied Psychology Unit (1988).
8. M. Weintraub, “A theory and computational model of auditory monaural sound separation,” Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA, 1985.
9. S. Srinivasan, N. Roman, and D. L. Wang, “Binary and ratio time-frequency masks for robust speech recognition,” Speech Commun. 48, 14861501 (2006).
10. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series (MIT Press, Cambridge, MA, 1949).
11. S. Mallat, A Wavelet Tour of Signal Processing (Academic, New York, 1998), Chap. 4.
12. Y. P. Li and D. L. Wang, “On the optimality of ideal binary time-frequency masks,” Speech Commun. 51, 230239 (2009).
13. M. P. Cooke, J. Barker, S. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” J. Acoust. Soc. Am. 120(5), 24212424 (2006).
15. T. Melia, “Underdetermined blind source separation in echoic environments using linear arrays and sparse representations,” Ph.D dissertation, University College Dublin, National University of Ireland, 2007.
16. Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process. 16(1), 229238 (2008).

Data & Media loading...


Article metrics loading...



In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.


Full text loading...


Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd