Consonant and vowel confusions in speech-weighted noisea)
Confusion patterns (CPs) for from MN55. The thick solid line without markers is , which is the sum of off-diagonal entries. The horizontal dashed line shows the chance level of . The legend provides the marker style used for consonants. These markers will be used throughout the paper.
(Color online) The power spectral densities (PSD) of average speech (solid) and noise (dashed) for UIUCs04 at wideband SNR. The PSDs for both speech and noise were calculated using the function in MATLAB, with a hanning window of duration (i.e., 320 samples) with an overlap of and a fast Fourier transform length of 2048 points.
The CV recognition scores of 14 listeners, as a function of SNR. Dashed lines show the four low performance (LP) listeners. The quiet condition is denoted by “Q.”
Histograms of the incorrect recognition of (a) consonants and (b) vowels, in the “confusable” and “marginally confusable” utterances, where is the syllable error for that utterance in the quiet. There are 560 responses for each consonant ( listeners) while there are 2240 responses for each vowel ( listeners) in the quiet condition.
(Color online) Recognition scores for vowels (top) and consonants (bottom), as a function of wideband SNR. In the top plot, the thick solid line is the average vowel recognition score , while in the bottom plot, the three solid lines represent the average scores for the three consonant sets and the dash-dotted line represents the average consonant score . The chance levels, for vowels and for consonants, are shown by the horizontal dashed black lines.
(Color online) Vowel-to-consonant recognition ratio (on a log scale) as a function of SNR , for each consonant. The color and the markers denote the same information as that in the bottom panel of Fig. 5. The average for the consonant sets are shown by the thick solid lines, while that for average consonant score is shown by the thick, dash-dotted line.
(Color online) (a) Consonant recognition scores , and (b) consonant recognition error (Log scale), plotted as a function of AI. The dashed lines represent individual consonants, while the three colored solid lines represent average values for the three consonant sets. The average consonant score (thick dash-dotted line) is very close to that predicted by the AI model (thick solid line). The data for the Quiet condition are not shown.
(Color online) The four small panels show the gray-scale images of the CMs at four SNR values. The gray-scale intensity is proportional to the log of the value of each entry in the row-normalized CM, with black color representing unity and white color representing the chance performance . Dashed lines separate sets C1 (Nos. 1–24), C3 (Nos. 25–44), and C2 (Nos. 45–64), in that order, from left to right and top to bottom. The two enlarged color panels at the bottom show set C2 at SNR and set C1 at SNR.
(Color online) The SNR spectra for consonants in set C2 (top), C1 (center), and C3 (bottom). The thin gray lines show the SNR spectra for individual consonants while the thick colored line in each panel shows the average SNR spectrum for that set. Each panel contains the SNR spectrum for average speech (thin black line), estimated using the speech and noise spectra shown in Fig. 2.
(Color online) Consonant CPs for (top left), /fε/ (top right), /fɪ/ (bottom left), and /fæ/ (bottom right). are four rows of the consonant CM that correspond to presentation of consonant /f/ and vowel at a given SNR. The gray thin lines with square symbols in the CP figures represent the sounds that are not confused with the diagonal sound and hence do not cross above the chance level. In all CP figures, the quiet condition is plotted at SNR for convenience.
Consonant CPs for consonants /s/ (top left), /ʃ/ (top right), /z/ (bottom left), and /ʒ/ (bottom right). The unvoiced consonants (top panels) have only one strong competitor which accounts for the total recognition error (thick solid line), while the voiced consonants have multiple competitors that contribute to the total error.
Consonant-independent vowel CPs The legend for vowel symbols is given in Fig. 5, top panel.
Plots of the average values of the second formant frequency of vowels vs the vowel durations for male (left panel) and female talkers (right panel). The values of the duration are from Hillenbrand et al. (HGCW), while the values of are from HGCW (hollow symbols) as well as Peterson and Barney (PB) (filled symbols), estimated using isolated /hVd/ syllables.
(a) Vowel clustering in 3D eigenspace (dimensions 2–4) of the vowel CM . The gray-scale intensity of the symbols corresponds to the six SNR levels (i.e., the SNR and the ). (b) Two-dimensional projection of the vowel clusters. The projection matches the clean speech clustering with the vowel distribution in the left panel of Fig. 13. The lines indicate paths traced by the vowels in the 2D plane of projection, as the SNR decreases.
(Color online) Consonant recognition scores for the 18 consonants used by Grant and Walden (1996). The hollow square and the opaque square represent the consonants /tʃ/ and /dʒ/, respectively.
A comparison of the consonant recognition scores for the current experiment [UIUCs04], Grant and Walden (1996) [GW96], and Miller and Nicely (1955) [MN55] as functions of (a) SNR and (b) AI.
Mathematical expressions, sizes, and descriptions of the five basic types of CM used in this study. and indicate the spoken consonant and vowel, while and represent the consonant and vowel reported by the listener.
Article metrics loading...
Full text loading...