Index of content:
Volume 135, Issue 1, January 2014
- SPEECH PERCEPTION 
135(2014); http://dx.doi.org/10.1121/1.4829525View Description Hide Description
Previous work has shown that human listeners are sensitive to level differences in high-frequency energy (HFE) in isolated vowel sounds produced by male singers. Results indicated that sensitivity to HFE level changes increased with overall HFE level, suggesting that listeners would be more “tuned” to HFE in vocal production exhibiting higher levels of HFE. It follows that sensitivity to HFE level changes should be higher (1) for female vocal production than for male vocal production and (2) for singing than for speech. To test this hypothesis, difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs speech but not in female vs male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6–11.3 kHz) but 8–10 dB in the 16-kHz octave (11.3–22 kHz). These scores are lower (better) than those previously reported for isolated vowels and some musical instruments.
135(2014); http://dx.doi.org/10.1121/1.4835935View Description Hide Description
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065–1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed in a setup with a frontal target sentence and two concurrent masker sentences at ±50° azimuth. For a group of 16 normal-hearing listeners and a group of 15 elderly (linearly aided) hearing-impaired listeners, overall SRTs of, respectively, +1.3 dB and +6.3 dB target-to-masker ratio were obtained. The new corpus was found to be very sensitive to inter-individual differences and produced consistent results across test lists. The corpus is publicly available.
135(2014); http://dx.doi.org/10.1121/1.4829528View Description Hide Description
The vowel space area (VSA) has been studied as a quantitative index of intelligibility to the extent it captures articulatory working space and reductions therein. The majority of such studies have been empirical wherein measures of VSA are correlated with perceptual measures of intelligibility. However, the literature contains minimal mathematical analysis of the properties of this metric. This paper further develops the theoretical underpinnings of this metric by presenting a detailed analysis of the statistical properties of the VSA and characterizing its distribution through the moment generating function. The theoretical analysis is confirmed by a series of experiments where empirically estimated and theoretically predicted statistics of this function are compared. The results show that on the Hillenbrand and TIMIT data, the theoretically predicted values of the higher-order statistics of the VSA match very well with the empirical estimates of the same.
Contribution of low-frequency harmonics to Mandarin Chinese tone identification in quiet and six-talker babble background135(2014); http://dx.doi.org/10.1121/1.4837255View Description Hide Description
The goal of this study was to investigate Mandarin Chinese tone identification in quiet and multi-talker babble conditions for normal-hearing listeners. Tone identification was measured with speech stimuli and stimuli with low and/or high harmonics that were embedded in three Mandarin vowels with two fundamental frequencies. There were six types of stimuli: all harmonics (All), low harmonics (Low), high harmonics (High), and the first (H1), second (H2), and third (H3) harmonic. Results showed that, for quiet conditions, individual harmonics carried frequency contour information well enough for tone identification with high accuracy; however, in noisy conditions, tone identification with individual low harmonics (e.g., H1, H2, and H3) was significantly lower than that with the Low, High, and All harmonics. Moreover, tone identification with individual harmonics in noise was lower for a low F0 than for a high F0, and was also dependent on vowel category. Tone identification with individual low-frequency harmonics was accounted for by local signal-to-noise ratios, indicating that audibility of harmonics in noise may play a primary role in tone identification.
135(2014); http://dx.doi.org/10.1121/1.4837238View Description Hide Description
Using the data presented in the accompanying paper [Hilkhuysen et al., J. Acoust. Soc. Am. 131, 531–539 (2012)], the ability of six metrics to predict intelligibility of speech in noise before and after noise suppression was studied. The metrics considered were the Speech Intelligibility Index (SII), the fractional Articulation Index (fAI), the coherence intelligibility index based on the mid-levels in speech (CSIImid), an extension of the Normalized Coherence Metric (NCM+), a part of the speech-based envelope power model (pre-sEPSM), and the Short Term Objective Intelligibility measure (STOI). Three of the measures, SII, CSIImid, and NCM+, overpredicted intelligibility after noise reduction, whereas fAI underpredicted these intelligibilities. The pre-sEPSM metric worked well for speech in babble but failed with car noise. STOI gave the best predictions, but overall the size of intelligibility prediction errors were greater than the change in intelligibility caused by noise suppression. Suggestions for improvements of the metrics are discussed.