1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Detection of shouted speech in noise: Human and machine
Rent:
Rent this article for
USD
10.1121/1.4794394
/content/asa/journal/jasa/133/4/10.1121/1.4794394
http://aip.metastore.ingenta.com/content/asa/journal/jasa/133/4/10.1121/1.4794394

Figures

Image of FIG. 1.
FIG. 1.

Averaged spectra for normal (dashed line) and shouted (solid line) speech of 11 male (top) and 11 female (bottom) speakers.

Image of FIG. 2.
FIG. 2.

Stages of obtaining MFCCs from the squared magnitude spectrum. The chain consists of three parts: computation of frequency band energies using filters with triangular passbands spaced evenly according to the mel scale, taking a logarithm of the band energies, and discrete cosine transform of the logarithmic energies.

Image of FIG. 3.
FIG. 3.

Alternative paths for computing the squared magnitude spectrum, which is used as an input to the MFCC chain shown in Fig. 2 .

Image of FIG. 4.
FIG. 4.

Vowel [o] spoken normally (top) and with high vocal effort (bottom) by a male speaker. LP and WLP spectrum envelopes and the cepstrally liftered excitation spectrum (left) are used to construct alternative spectrum estimates (right) besides the FFT spectrum. The notation next to the curves corresponds to Fig. 3 .

Image of FIG. 5.
FIG. 5.

Example spectra based on a shouted vowel frame by a male speaker. The rows correspond, from top to bottom, to SNR levels 0, , and dB with factory noise corruption. The columns correspond to different types of spectra. The notation in parentheses corresponds to Fig. 3 .

Image of FIG. 6.
FIG. 6.

Mean values (solid line) and standard deviation intervals (dotted line) of MFCCs averaged over normal and shouted speech from 11 male and 11 female speakers.

Image of FIG. 7.
FIG. 7.

Mean EER values in factory noise for the sensitivity of the automatic detector to shouting for factors spectral estimation method, SNR, and speaker gender. Error bars indicate standard errors of the mean.

Image of FIG. 8.
FIG. 8.

Mean EER values in babble noise for the sensitivity of the automatic detector to shouting for factors spectral estimation method, SNR, and speaker gender. Error bars indicate standard errors of the mean.

Image of FIG. 9.
FIG. 9.

(Color online) DET curves of the machine detection system for factory noise at SNR levels (a) −20 dB and (b) 10 dB.

Image of FIG. 10.
FIG. 10.

(Color online) DET curves of the machine detection system for babble noise at SNR levels (a) −20 dB and (b) 10 dB.

Image of FIG. 11.
FIG. 11.

Mean values for the sensitivity of the automatic detector and human listeners to shouting for factors analysis method (comprising different spectrum analysis methods as well as male and female listeners) and SNR. Error bars indicate standard errors of the mean.

Image of FIG. 12.
FIG. 12.

Sensitivity of female and male listeners to shouting. Mean for factors SNR and listener gender are shown. Error bars indicate standard errors of the mean.

Image of FIG. 13.
FIG. 13.

Sensitivity ( ) of female and male listeners to shouting. Mean for factors SNR, shouting class (of the speaker) and listener gender are shown. Error bars indicate standard errors of mean. M = male listeners, F = female listeners, L = low shouters, H = high shouters.

Tables

Generic image for table
TABLE I.

List of the Finnish sentences used in collecting the speech material.

Generic image for table
TABLE II.

Speaker-specific averaged SPL in decibels for normal and shouted speech and their difference. Each of the SPL values has been obtained by first integrating the signal energy in frames of 25 ms with a 10 ms sampling interval, relating the result to the reference signal to obtain the frame SPL value, and then averaging the frame SPL values over the most energetic 50% of the frames for the specific speaker and speaking condition.

Generic image for table
TABLE III.

Equal error rates (%) for different numbers of MFCCs.

Generic image for table
TABLE IV.

Equal error rates (%) for 12 and 30 MFCCs concatenated with and coefficients.

Generic image for table
TABLE V.

Equal error rates (%) for MFCC features using different spectrum analysis methods in factory noise.

Generic image for table
TABLE VI.

Equal error rates (%) for MFCC features using different spectrum analysis methods in babble noise.

Generic image for table
TABLE VII.

Main results of the subjective listening test for shouted speech detection in babble noise by human listeners. The total, male listener and female listener false alarm rates were obtained by averaging the respective normal speech false alarm rates with the respective pure noise false alarm rates.

Loading

Article metrics loading...

/content/asa/journal/jasa/133/4/10.1121/1.4794394
2013-04-03
2014-04-19
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Detection of shouted speech in noise: Human and machine
http://aip.metastore.ingenta.com/content/asa/journal/jasa/133/4/10.1121/1.4794394
10.1121/1.4794394
SEARCH_EXPAND_ITEM