Index of content:
Volume 134, Issue 5, November 2013
- SPEECH PERCEPTION 
134(2013); http://dx.doi.org/10.1121/1.4824161View Description Hide Description
Previous work has shown that velar stops are produced with a forward movement during closure, forming a forward (anterior) loop for a VCV sequence, when the preceding vowels are back or mid. Are listeners aware of this aspect of articulatory dynamics? The current study used articulatory synthesis to examine how such kinematic patterns are reflected in the acoustics, and whether those acoustic patterns elicit different goodness ratings. In Experiment I, the size and direction of loops was modulated in articulatory synthesis. The resulting stimuli were presented to listeners for a naturalness judgment. Results show that listeners rate forward loops as more natural than backward loops, in agreement with typical productions. Acoustic analysis of the synthetic stimuli shows that forward loops exhibit shorter and shallower VC transitions than CV transitions. In Experiment II, three acoustic parameters were employed incorporating F3-F2 distance, transition slope, and transition length to systematically modulate the magnitude of VC and CV transitions. Listeners rated the naturalness in accord with those of Experiment I. This study reveals that there is sufficient information in the acoustic signature of “velar loops” to affect perceptual preference. Similarity to typical productions seemed to determine preferences, not acoustic distinctiveness.
Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility dataa)134(2013); http://dx.doi.org/10.1121/1.4821216View Description Hide Description
Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.
134(2013); http://dx.doi.org/10.1121/1.4823848View Description Hide Description
Perceptual attunement to one's native language results in language-specific processing of speech sounds. This includes stress cues, instantiated by differences in intensity, pitch, and duration. The present study investigates the effects of linguistic experience on the perception of these cues by studying the Iambic–Trochaic Law (ITL), which states that listeners group sounds trochaically (strong-weak) if the sounds vary in loudness or pitch and iambically (weak-strong) if they vary in duration. Participants were native listeners either of French or German; this comparison was chosen because French adults have been shown to be less sensitive than speakers of German and other languages to word-level stress, which is communicated by variation in cues such as intensity, fundamental frequency (F0), or duration. In experiment 1, participants listened to sequences of co-articulated syllables varying in either intensity or duration. The German participants were more consistent in their grouping than the French for both cues. Experiment 2 was identical to experiment 1 except that intensity variation was replaced by pitch variation. German participants again showed more consistency for both cues, and French participants showed especially inconsistent grouping for the pitch-varied sequences. These experiments show that the perception of linguistic rhythm is strongly influenced by linguistic experience.
134(2013); http://dx.doi.org/10.1121/1.4824341View Description Hide Description
Speech perception skills in cochlear-implant users are often measured with simple speech materials. In children, it is crucial to fully characterize linguistic development, and this requires linguistically more meaningful materials. The authors propose using the comprehension of reflexives and pronouns, as these specific skills are acquired at different ages. According to the literature, normal-hearing children show adult-like comprehension of reflexives at age 5, while their comprehension of pronouns only reaches adult-like levels around age 10. To provide normative data, a group of younger children (5 to 8 yrs old), older children (10 and 11 yrs old), and adults were tested under conditions without or with spectral degradation, which simulated cochlear-implant speech transmission with four and eight channels. The results without degradation confirmed the different ages of acquisition of reflexives and pronouns. Adding spectral degradation reduced overall performance; however, it did not change the general pattern observed with non-degraded speech. This finding confirms that these linguistic milestones can also be measured with cochlear-implanted children, despite the reduced quality of sound transmission. Thus, the results of the study have implications for clinical practice, as they could contribute to setting realistic expectations and therapeutic goals for children who receive a cochlear implant.