Volume 118, Issue 1, July 2005
Index of content:
- SPEECH PRODUCTION 
118(2005); http://dx.doi.org/10.1121/1.1928807View Description Hide Description
In this paper we present an algorithm for building an empirical model of facial biomechanics from a set of displacement records of markers located on the face of a subject producing speech. Markers are grouped into clusters, which have a unique primary marker and a number of secondary markers with an associated weight. Motion of the secondary markers is computed as the weighted sum of the primary markers of the clusters to which they belong. This model may be used to produce facial animations, by driving the primary markers with measured kinematic signals.
118(2005); http://dx.doi.org/10.1121/1.1862251View Description Hide Description
A theoretical approach to describing unvoiced speechsound production is outlined using the essentials of aerodynamics and aeroacoustics. The focus is on the character and role of nonacoustic air motion in the vocal tract. An idealized picture of speechsound production is presented showing that speechsound production involves the dynamics of a jet flow, characterized by vorticity. A formal expression is developed for the sound production by unsteady airflow in terms of jet vorticity and vocal-tract shape, and a scaling law for the aeroacoustic source power is derived. The generic features of internal jet flows such as those exhibited in speechsound production are discussed, particularly in terms of the vorticity field, and the relevant scales of motion are identified. An approximate description of a jet as a train of vortex rings, useful for sound-field prediction, is described using the scales both of motion and of vocal-tract geometry. It is shown that the aeroacoustic source may be expressed as the convolution of (1) the acoustic source time series due to a single vortex ring with (2) a function describing the arrival of vortex rings in the source region. It is shown that, in general, the characteristics of the aeroacoustic source are determined not only by the strength, spatial distribution, and convection speed of the jet vorticity field, but also the shape of the vocal tract through which the jet flow passes. For turbulent jets, such as those which occur in unvoiced sound production, however, vocal-tract shape is the dominant factor in determining the spectral content of the source.
118(2005); http://dx.doi.org/10.1121/1.1928707View Description Hide Description
A measurement principle of the three-dimensional electromagnetic articulographic device is presented. The state of the miniature receiver coil is described by five variables representing the position in the three-dimensional coordinate system and the rotation angles relative to it. When the receiver coil is placed in the magnetic field produced from the distributed transmitter coils, its state can be optimally estimated by minimizing the difference between the measured strength of the received signal and the predicted one using the known spatial pattern of the magnetic field. Therefore, the design and calibration of the field function inherently determine the accuracy in estimating the state of the receiver coil. The field function in our method is expressed in the form of a multivariate B spline as a function of position in the three-dimensional space. Because of the piecewise property of the basis function and the freedom in the selection of the rank and the number of basis functions, the spline field function has a superior ability to flexibly and accurately represent the actual magnetic field. Given a set of calibration data, the spline function is designed to form a smooth curved surface interpolating all of these data samples. Then, an iterative procedure is employed to solve the nonlinear estimation problem of the receiver state variables. Because the spline basis function is a polynomial, it is also shown that the calculation of the Jacobian or Hessian required to obtain updated quantities for the state variables can be efficiently performed. Finally, experimental results reveal that the measurement accuracy is about 0.2 mm for a preliminary condition, indicating that the method can achieve the degree of precision required for observing articulatory movements in a three-dimensional space. It is also experimentally shown that the Marquardt method is a better nonlinear programming technique than the Gauss–Newton or Newton–Raphson method for solving the receiver state problem.
118(2005); http://dx.doi.org/10.1121/1.1921448View Description Hide Description
Acoustic-to-articulatory inversion is a difficult problem mainly because of the nonlinearity between the articulatory and acoustic spaces and the nonuniqueness of this relationship. To resolve this problem, we have developed an inversion method that provides a complete description of the possible solutions without excessive constraints and retrieves realistic temporal dynamics of the vocal tract shapes. We present an adaptive sampling algorithm to ensure that the acoustical resolution is almost independent of the region in the articulatory space under consideration. This leads to a codebook that is organized in the form of a hierarchy of hypercubes, and ensures that, within each hypercube, the articulatory-to-acoustic mapping can be approximated by means of a linear transform. The inversion procedure retrieves articulatory vectors corresponding to acoustic entries from the hypercube codebook. A nonlinear smoothing algorithm together with a regularization technique is then used to recover the best articulatory trajectory. The inversion ensures that inverse articulatory parameters generate original formant trajectories with high precision and a realistic sequence of the vocal tract shapes.