Index of content:
Volume 134, Issue 2, August 2013
- SPEECH PERCEPTION 
High stimulus variability in nonnative speech learning supports formation of abstract categories: Evidence from Japanese geminates134(2013); http://dx.doi.org/10.1121/1.4812767View Description Hide Description
This study reports effects of a high-variability training procedure on nonnative learning of a Japanese geminate-singleton fricative contrast. Thirty native speakers of Dutch took part in a 5-day training procedure in which they identified geminate and singleton variants of the Japanese fricative /s/. Participants were trained with either many repetitions of a limited set of words recorded by a single speaker (low-variability training) or with fewer repetitions of a more variable set of words recorded by multiple speakers (high-variability training). Both types of training enhanced identification of speech but not of nonspeech materials, indicating that learning was domain specific. High-variability training led to superior performance in identification but not in discrimination tests, and supported better generalization of learning as shown by transfer from the trained fricatives to the identification of untrained stops and affricates. Variability thus helps nonnative listeners to form abstract categories rather than to enhance early acoustic analysis.
134(2013); http://dx.doi.org/10.1121/1.4813304View Description Hide Description
This research is aimed at analyzing and improving automatic pronunciation error detection in a second language. Dutch vowels spoken by adult non-native learners of Dutch are used as a test case. A first study on Dutch pronunciation by L2 learners with different L1s revealed that vowel pronunciation errors are relatively frequent and often concern subtle acoustic differences between the realization and the target sound. In a second study automatic pronunciation error detection experiments were conducted to compare existing measures to a metric that takes account of the error patterns observed to capture relevant acoustic differences. The results of the two studies do indeed show that error patterns bear information that can be usefully employed in weighted automatic measures of pronunciation quality. In addition, it appears that combining such a weighted metric with existing measures improves the equal error rate by 6.1 percentage points from 0.297, for the Goodness of Pronunciation (GOP) algorithm, to 0.236.
Simulating the effect of interaural mismatch in the insertion depth of bilateral cochlear implants on speech perception134(2013); http://dx.doi.org/10.1121/1.4812272View Description Hide Description
A bilateral advantage for diotically presented stimuli has been observed for cochlear implant (CI) users and is suggested to be dependent on symmetrical implant performance. Studies using CI simulations have not shown a true “bilateral” advantage, but a “better ear” effect and have demonstrated that performance decreases with increasing basalward shift in insertion depth. This study aimed to determine whether there is a bilateral advantage for CI simulations with interaurally matched insertions and the extent to which performance is affected by interaural insertion depth mismatch. Speech perception in noise and self-reported ease of listening were measured using matched bilateral, mismatched bilateral and unilateral CI simulations over four insertion depths for seventeen normal hearing listeners. Speech scores and ease of listening reduced with increasing basalward shift in (interaurally matched) insertion depth. A bilateral advantage for speech perception was only observed when the insertion depths were interaurally matched and deep. No advantage was observed for small to moderate interaural insertion-depth mismatches, consistent with a better ear effect. Finally, both measures were poorer than expected for a better ear effect for large mismatches, suggesting that misalignment of the electrode arrays may prevent a bilateral advantage and detrimentally affect perception of diotically presented speech.
134(2013); http://dx.doi.org/10.1121/1.4812764View Description Hide Description
This study examined younger (n = 16) and older (n = 16) listeners' processing of dysarthric speech—a naturally occurring form of signal degradation. It aimed to determine how age, hearing acuity, memory, and vocabulary knowledge interacted in speech recognition and lexical segmentation. Listener transcripts were coded for accuracy and pattern of lexical boundary errors. For younger listeners, transcription accuracy was predicted by receptive vocabulary. For older listeners, this same effect existed but was moderated by pure-tone hearing thresholds. While both groups employed syllabic stress cues to inform lexical segmentation, older listeners were less reliant on this perceptual strategy. The results were interpreted to suggest that individuals with larger receptive vocabularies, with their presumed greater language familiarity, were better able to leverage cue redundancies within the speech signal to form lexical hypothesis—leading to an improved ability to comprehend dysarthric speech. This advantage was minimized as hearing thresholds increased. While the differing levels of reliance on stress cues across the listener groups could not be attributed to specific individual differences, it was hypothesized that some combination of larger vocabularies and reduced hearing thresholds in the older participant group led to them prioritize lexical cues as a segmentation frame.
134(2013); http://dx.doi.org/10.1121/1.4812759View Description Hide Description
Much recent interest surrounds listeners' abilities to adapt to various transformations that distort speech. An extreme example is spectral rotation, in which the spectrum of low-pass filtered speech is inverted around a center frequency (2 kHz here). Spectral shape and its dynamics are completely altered, rendering speech virtually unintelligible initially. However, intonation, rhythm, and contrasts in periodicity and aperiodicity are largely unaffected. Four normal hearing adults underwent 6 h of training with spectrally-rotated speech using Continuous Discourse Tracking. They and an untrained control group completed pre- and post-training speech perception tests, for which talkers differed from the training talker. Significantly improved recognition of spectrally-rotated sentences was observed for trained, but not untrained, participants. However, there were no significant improvements in the identification of medial vowels in /bVd/ syllables or intervocalic consonants. Additional tests were performed with speech materials manipulated so as to isolate the contribution of various speech features. These showed that preserving intonational contrasts did not contribute to the comprehension of spectrally-rotated speech after training, and suggested that improvements involved adaptation to altered spectral shape and dynamics, rather than just learning to focus on speech features relatively unaffected by the transformation.