AIP Publishing manuscript submission and processing system (PXP) is currently unavailable to users in China. We are working to resolve the issue as quickly as possible. We apologize for the inconvenience.

尊敬的中国作者和评审人:AIP Publishing (AIP出版公司) 的论文发布系统(PXP)目前遇到一些技术问题。我们将为您尽快解决。因此带来的不便,我们向您表达我们诚挚的歉意!

Thank you for your patience during this process.

banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
Rent this article for
Access full text Article
1. G. N. Hu and D. L. Wang, “Speech segregation based on pitch tracking and amplitude modulation,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2001).
2. G. N. Hu and D. L. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Networks 15(5), 11351150 (2004).
3. Y. Hu and P. C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE. Trans. Speech Audio Process. 12(1), 5967 (2004).
4. S. Rangachari and P. C. Loizou, “A noise-estimation algorithm for highly non-stationary environments,” Speech Commun. 48, 220231 (2006).
5. O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. Signal Process. 52(7), 18301846 (2004).
6. H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Trans. Audio, Speech, Lang. Process. 19(3), 516527 (2011).
7. R. D. Patterson, J. Holdsworth, I. Nimmo-Smith, and P. Rice, “An efficient auditory filterbank based on the gammatone function,” Report No. 2341, MRC Applied Psychology Unit (1988).
8. M. Weintraub, “A theory and computational model of auditory monaural sound separation,” Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA, 1985.
9. S. Srinivasan, N. Roman, and D. L. Wang, “Binary and ratio time-frequency masks for robust speech recognition,” Speech Commun. 48, 14861501 (2006).
10. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series (MIT Press, Cambridge, MA, 1949).
11. S. Mallat, A Wavelet Tour of Signal Processing (Academic, New York, 1998), Chap. 4.
12. Y. P. Li and D. L. Wang, “On the optimality of ideal binary time-frequency masks,” Speech Commun. 51, 230239 (2009).
13. M. P. Cooke, J. Barker, S. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” J. Acoust. Soc. Am. 120(5), 24212424 (2006).
15. T. Melia, “Underdetermined blind source separation in echoic environments using linear arrays and sparse representations,” Ph.D dissertation, University College Dublin, National University of Ireland, 2007.
16. Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Trans. Audio, Speech, Lang. Process. 16(1), 229238 (2008).

Data & Media loading...


Article metrics loading...



In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.


Full text loading...

This is a required field
Please enter a valid email address

Oops! This section, does not exist...

Use the links on this page to find existing content.

752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio