Volume 117, Issue 2, February 2005
Index of content:
- SPEECH PROCESSING AND COMMUNICATION SYSTEMS 
117(2005); http://dx.doi.org/10.1121/1.1841572View Description Hide Description
A statistics-based syllable pitch contour model for Mandarin speech is proposed. This approach takes the mean and the shape of a syllable log-pitch contour as two basic modeling units and considers several affecting factors that contribute to their variations. The affecting factors include the speaker, prosodic state (which essentially represents the high-level linguistic components of F0 and will be explained more clearly in Sec. I), tone, and initial and final syllable classes. The parameters of the two modeling units were automatically estimated using the expectation-maximization (EM) algorithm. Experimental results showed that the root mean squared errors (RMSEs) obtained in the closed and open tests in the reconstructed pitch period were 0.362 and 0.373 ms, respectively. This model provides a way to separate the effects of several major factors. All of the inferred values of the affecting factors were in close agreement with our prior linguistic knowledge. It also gives a quantitative and more complete description of the coarticulation effect of neighboring tones rather than conventional qualitative descriptions of the tone sandhi rules. In addition, the model can provide useful cues to determine the prosodic phrase boundaries, including those occurring at intersyllable locations, with or without punctuation marks.