Volume 117, Issue 2, February 2005
Index of content:
- SPEECH PERCEPTION 
117(2005); http://dx.doi.org/10.1121/1.1852549View Description Hide Description
Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%–12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels.
Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners117(2005); http://dx.doi.org/10.1121/1.1823291View Description Hide Description
This study examined the effect of presumed mismatches between speech input and the phonological representations of English words by native speakers of English (NE) and Spanish (NS). The English test words, which were produced by a NE speaker and a NS speaker, varied orthogonally in lexical frequency and neighborhood density and were presented to NE listeners and to NS listeners who differed in English pronunciation proficiency. It was hypothesized that mismatches between phonological representations and speech input would impair word recognition, especially for items from dense lexical neighborhoods which are phonologically similar to many other words and require finer sound discrimination. Further, it was assumed that L2 phonological representations would change with L2 proficiency. The results showed the expected mismatch effect only for words from dense neighborhoods. For Spanish-accented stimuli, the NS groups recognized more words from dense neighborhoods than the NE group did. For native-produced stimuli, the low-proficiency NS group recognized fewer words than the other two groups. The-high proficiency NS participants’ performance was as good as the NE group’s for words from sparse neighborhoods, but not for words from dense neighborhoods. These results are discussed in relation to the development of phonological representations of L2 words. (200 words).