Index of content:
Volume 104, Issue 5, November 1998
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
104(1998); http://dx.doi.org/10.1121/1.423887View Description Hide Description
Accurate estimation of the glottal waveform (GW) is required for purposes such as natural speech synthesis, speaker recognition, physiological speech processing, etc. Most methods available for GW estimation are based on inverse filtering of the speech signal through the vocal tract, and they all suffer from inaccuracies due to incorrect assumptions. The method for GW estimation developed in the present study is based on fuzzy clustering of quasi-linear geometrical substructures, represented within the signal shifts hyperspace. Algorithms for estimation of the driving function to the vocal tract are presented and evaluated on simulated and real data. Comparison of the fuzzy clustering-based method with the PSIAIF and Wong’s closed-phase algorithms shows that the present method is superior with respect to both the GW estimation and determination of GW event time instants.
104(1998); http://dx.doi.org/10.1121/1.423888View Description Hide Description
When resolving errors with interactive systems, people sometimes hyperarticulate—or adopt a clarified style of speech that has been associated with increased recognition errors. The primary goals of the present study were: (1) to provide a comprehensive analysis of acoustic, prosodic, and phonological adaptations to speech during human–computer error resolution after different types of recognition error; and (2) to examine changes in speech during both global and focal utterance repairs. A semi-automatic simulation method with a novel error-generation capability was used to compare speech immediately before and after system recognition errors. Matched original-repeat utterance pairs then were analyzed for type and magnitude of linguistic adaptation during global and focal repairs. Results indicated that the primary hyperarticulate changes in speech following all error types were durational, with increases in number and length of pauses most noteworthy. Speech also was adapted toward a more deliberate and hyperclear articulatory style. During focal error repairs, large durational effects functioned together with pitch and amplitude to provide selective prominence marking of the repair region. These results corroborate and generalize the computer-elicited hyperarticulate adaptation model (CHAM). Implications are discussed for improved error handling in next-generation spoken language and multimodal systems.