No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
1. G. N. Hu and D. L. Wang, “Speech segregation based on pitch tracking and amplitude modulation,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY (2001).
2. G. N. Hu and D. L. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Networks 15(5), 1135–1150 (2004).
6. H. Sawada, S. Araki, and S. Makino, “Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment,” IEEE Trans. Audio, Speech, Lang. Process. 19(3), 516–527 (2011).
7. R. D. Patterson, J. Holdsworth, I. Nimmo-Smith, and P. Rice, “An efficient auditory filterbank based on the gammatone function,” Report No. 2341, MRC Applied Psychology Unit (1988).
8. M. Weintraub, “A theory and computational model of auditory monaural sound separation,” Ph.D. dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA, 1985.
10. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series (MIT Press, Cambridge, MA, 1949).
11. S. Mallat, A Wavelet Tour of Signal Processing (Academic, New York, 1998), Chap. 4.
13. M. P. Cooke, J. Barker, S. Cunningham, and X. Shao, “An audio-visual corpus for speech perception and automatic speech recognition,” J. Acoust. Soc. Am. 120(5), 2421–2424 (2006).
15. T. Melia, “Underdetermined blind source separation in echoic environments using linear arrays and sparse representations,” Ph.D dissertation, University College Dublin, National University of Ireland, 2007.
Article metrics loading...
In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.
Full text loading...
Most read this month