Volume 120, Issue 3, September 2006
Index of content:
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
Temporal characteristics of nasalization in children and adult speakers of American English and Korean during production of three vowel contextsa)120(2006); http://dx.doi.org/10.1121/1.2225382View Description Hide Description
The purpose of this study was to identify and compare the temporal characteristics of nasalization in relation to (1) languages, (2) vowel contexts, and (3) age groups. Two distinct acoustic energies from the mouth and nose were recorded during speech production (/pamap, pimip, pumup/) using two microphones to obtain the absolute and proportional measurements on the acoustic temporal characteristics of nasalization. Twenty-eight normal adults (14 American English and 14 Korean speakers) and 28 normal children (14 American English and 14 Korean speakers) participated in this study. In both languages, adults showed shorter duration of nasalization than children within all three vowel contexts. The high vowel context revealed longer duration of nasalization than the low vowel context in both languages. There was no significant difference of temporal characteristics of nasalization between American English and Korean. Nasalization showed different timing characteristics between children and adults across vowel contexts. The results are discussed in association with developmental coarticulation and the relationship between acoustic consequences of articulatory events and vowel height.
Speech utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics120(2006); http://dx.doi.org/10.1121/1.2225570View Description Hide Description
This paper investigates the problem of how to partition unknown speech utterances into a set of clusters, such that each cluster consists of utterances from only one speaker, and the number of clusters reflects the unknown speaker population size. The proposed method begins by specifying a certain number of clusters, corresponding to one of the possible speaker population sizes, and then maximizes the level of overall within-cluster homogeneity of the speakers’ voice characteristics. The within-cluster homogeneity is characterized by the likelihood probability that a cluster model, trained using all the utterances within a cluster, matches each of the within-cluster utterances. To attain the maximal sum of likelihood probabilities for all utterances, the proposed method applies a genetic algorithm to determine the cluster in which each utterance should be located. For greater computational efficiency, also proposed is a clustering criterion that approximates the likelihood probability with a divergence-based model similarity between a cluster and each of the within-cluster utterances. The clustering method then examines various legitimate numbers of clusters by adapting the Bayesian information criterion to determine the most likely speaker population size. The experimental results show the superiority of the proposed method over conventional methods based on hierarchical clustering.