banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Consonant and vowel confusions in speech-weighted noisea)
a)Parts of this analysis were presented at the ARO Midwinter Meeting 2005 (New Orleans), the Aging and Speech Communication 2005 Conference (Bloomington, IN) and the International Conference on Spoken Language Processing 2006 (Pittsburgh, PA).
Rent this article for


Image of FIG. 1.
FIG. 1.

Confusion patterns (CPs) for from MN55. The thick solid line without markers is , which is the sum of off-diagonal entries. The horizontal dashed line shows the chance level of . The legend provides the marker style used for consonants. These markers will be used throughout the paper.

Image of FIG. 2.
FIG. 2.

(Color online) The power spectral densities (PSD) of average speech (solid) and noise (dashed) for UIUCs04 at wideband SNR. The PSDs for both speech and noise were calculated using the function in MATLAB, with a hanning window of duration (i.e., 320 samples) with an overlap of and a fast Fourier transform length of 2048 points.

Image of FIG. 3.
FIG. 3.

The CV recognition scores of 14 listeners, as a function of SNR. Dashed lines show the four low performance (LP) listeners. The quiet condition is denoted by “Q.”

Image of FIG. 4.
FIG. 4.

Histograms of the incorrect recognition of (a) consonants and (b) vowels, in the “confusable” and “marginally confusable” utterances, where is the syllable error for that utterance in the quiet. There are 560 responses for each consonant ( listeners) while there are 2240 responses for each vowel ( listeners) in the quiet condition.

Image of FIG. 5.
FIG. 5.

(Color online) Recognition scores for vowels (top) and consonants (bottom), as a function of wideband SNR. In the top plot, the thick solid line is the average vowel recognition score , while in the bottom plot, the three solid lines represent the average scores for the three consonant sets and the dash-dotted line represents the average consonant score . The chance levels, for vowels and for consonants, are shown by the horizontal dashed black lines.

Image of FIG. 6.
FIG. 6.

(Color online) Vowel-to-consonant recognition ratio (on a log scale) as a function of SNR , for each consonant. The color and the markers denote the same information as that in the bottom panel of Fig. 5. The average for the consonant sets are shown by the thick solid lines, while that for average consonant score is shown by the thick, dash-dotted line.

Image of FIG. 7.
FIG. 7.

(Color online) (a) Consonant recognition scores , and (b) consonant recognition error (Log scale), plotted as a function of AI. The dashed lines represent individual consonants, while the three colored solid lines represent average values for the three consonant sets. The average consonant score (thick dash-dotted line) is very close to that predicted by the AI model (thick solid line). The data for the Quiet condition are not shown.

Image of FIG. 8.
FIG. 8.

(Color online) The four small panels show the gray-scale images of the CMs at four SNR values. The gray-scale intensity is proportional to the log of the value of each entry in the row-normalized CM, with black color representing unity and white color representing the chance performance . Dashed lines separate sets C1 (Nos. 1–24), C3 (Nos. 25–44), and C2 (Nos. 45–64), in that order, from left to right and top to bottom. The two enlarged color panels at the bottom show set C2 at SNR and set C1 at SNR.

Image of FIG. 9.
FIG. 9.

(Color online) The SNR spectra for consonants in set C2 (top), C1 (center), and C3 (bottom). The thin gray lines show the SNR spectra for individual consonants while the thick colored line in each panel shows the average SNR spectrum for that set. Each panel contains the SNR spectrum for average speech (thin black line), estimated using the speech and noise spectra shown in Fig. 2.

Image of FIG. 10.
FIG. 10.

(Color online) Consonant CPs for (top left), /fε/ (top right), /fɪ/ (bottom left), and /fæ/ (bottom right). are four rows of the consonant CM that correspond to presentation of consonant /f/ and vowel at a given SNR. The gray thin lines with square symbols in the CP figures represent the sounds that are not confused with the diagonal sound and hence do not cross above the chance level. In all CP figures, the quiet condition is plotted at SNR for convenience.

Image of FIG. 11.
FIG. 11.

Consonant CPs for consonants /s/ (top left), /ʃ/ (top right), /z/ (bottom left), and /ʒ/ (bottom right). The unvoiced consonants (top panels) have only one strong competitor which accounts for the total recognition error (thick solid line), while the voiced consonants have multiple competitors that contribute to the total error.

Image of FIG. 12.
FIG. 12.

Consonant-independent vowel CPs The legend for vowel symbols is given in Fig. 5, top panel.

Image of FIG. 13.
FIG. 13.

Plots of the average values of the second formant frequency of vowels vs the vowel durations for male (left panel) and female talkers (right panel). The values of the duration are from Hillenbrand et al. (HGCW), while the values of are from HGCW (hollow symbols) as well as Peterson and Barney (PB) (filled symbols), estimated using isolated /hVd/ syllables.

Image of FIG. 14.
FIG. 14.

(a) Vowel clustering in 3D eigenspace (dimensions 2–4) of the vowel CM . The gray-scale intensity of the symbols corresponds to the six SNR levels (i.e., the SNR and the ). (b) Two-dimensional projection of the vowel clusters. The projection matches the clean speech clustering with the vowel distribution in the left panel of Fig. 13. The lines indicate paths traced by the vowels in the 2D plane of projection, as the SNR decreases.

Image of FIG. 15.
FIG. 15.

(Color online) Consonant recognition scores for the 18 consonants used by Grant and Walden (1996). The hollow square and the opaque square represent the consonants /tʃ/ and /dʒ/, respectively.

Image of FIG. 16.
FIG. 16.

A comparison of the consonant recognition scores for the current experiment [UIUCs04], Grant and Walden (1996) [GW96], and Miller and Nicely (1955) [MN55] as functions of (a) SNR and (b) AI.


Generic image for table

Mathematical expressions, sizes, and descriptions of the five basic types of CM used in this study. and indicate the spoken consonant and vowel, while and represent the consonant and vowel reported by the listener.


Article metrics loading...


Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Consonant and vowel confusions in speech-weighted noisea)