1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Kalman-based autoregressive moving average modeling and inference for formant and antiformant trackinga)
a)Portions of this work were presented at the INTERSPEECH conference in Antwerp, Belgium, in August 2007 ( Rudoy et al., 2007 ).
Rent:
Rent this article for
USD
10.1121/1.4739462
/content/asa/journal/jasa/132/3/10.1121/1.4739462
http://aip.metastore.ingenta.com/content/asa/journal/jasa/132/3/10.1121/1.4739462

Figures

Image of FIG. 1.
FIG. 1.

Illustration of the (A) classical and (B) proposed approaches to formant tracking. Key advantages to the proposed KARMA approach include intra-frame observation of autoregressive moving average parameters for both formant and antiformant tracking, inter-frame tracking using linearized Kalman inference, and the availability of both point estimates and uncertainties for each trajectory.

Image of FIG. 2.
FIG. 2.

Comparison of extended Kalman filter (solid) and particle filter (dashed) tracking performance in terms of root-mean-square error (RMSE) averaged over 25 Monte Carlo trials and reported with 95% confidence intervals (gray).

Image of FIG. 3.
FIG. 3.

Estimated formant tracks on spectrogram of VTR utterance 19 by an adult female from New England: “Nice country to meet a lion in face to face.” Reference trajectories from the VTR database are shown (red, dashed) along with the formant frequency tracks (blue, solid) from (A) KARMA, (B) wavesurfer, and (C) praat. Overall root-mean-square error (RMSE) is reported across all formants and frames labeled as speech, in addition to separate RMSE values for f 1, f 2, and f 3. The KARMA output additionally displays uncertainty (gray shading, ±1 standard deviation) for each formant trajectory. Frames are categorized using TIMIT labels of phonetic class: vowel (blue), semivowel/glide (green), nasal (cyan), fricative (magenta), affricate (red), stop (black).

Image of FIG. 4.
FIG. 4.

Effect of bandwidth tracking and state covariance matrix Q on KARMA formant tracking. VTR utterance 10 by an adult female from New England: “Reading in poor light gives you eyestrain.” (A) Bandwidths fixed to baseline values in Table III with diagonal elements of Q equal to (224 Hz)2, (B) bandwidth values tracked with Q , as in (A), and (C) bandwidth values tracked with diagonal elements of Q increased to (949 Hz)2. Overall root-mean-square error (RMSE) is reported across all formants and frames labeled as speech in addition to separate RMSE values for f 1, f 2, and f 3. Color coding as in Fig. 3.

Image of FIG. 5.
FIG. 5.

Illustration of the output from KARMA for the synthesized utterance /nɑn/. (A) True trajectories (red, dashed) are shown with the mean estimates (solid blue for formants, solid green for antiformants) and uncertainties (gray shading) for each frequency and bandwidth. (B) plots an alternative display is shown with a wideband spectrogram along with estimated frequency and bandwidth tracks of formants (blue) and antiformants (green). The 3-dB bandwidths dictate the width of the corresponding frequency tracks.

Image of FIG. 6.
FIG. 6.

KARMA output for three spoken nasal consonants: (A) /m/, (B) /n/, and (C) /ŋ/. On the left, spectrograms overlay the mean estimates (blue for formants, green for antiformants) and uncertainties (gray shading) for each frequency and bandwidth. Plots to the right display the corresponding periodogram (gray) and spectral ARMA model fit (black).

Image of FIG. 7.
FIG. 7.

KARMA formant and antiformant tracks of utterance by adult male: “piano.” Displayed are the (A) wideband spectrogram of the speech waveform and (B) the spectrogram overlaid with formant frequency estimates (blue), antiformant frequency estimates (green), and uncertainties (±1 standard deviation) for each track (gray). Arrows indicate beginning and ending of utterance. Note that the increase in uncertainty during silence regions.

Image of FIG. 8.
FIG. 8.

Kalman-based formant tracks using the (A) parametric ARMA cepstrum and (B) nonparametric real cepstrum as observations. VTRsynth f0 waveform is a synthesized version of VTR utterance 1: “Even then, if she took one step forward, he could catch her.” Color coding as in Fig. 3.

Tables

Generic image for table
TABLE I.

The extended Kalman algorithms for yielding point estimates and associated uncertainties of tracked parameters. See text for definition of variables.

Generic image for table
TABLE II.

Proposed KARMA algorithm for formant and antiformant tracking.

Generic image for table
TABLE III.

Modifiable parameters and their baseline values for the three steps in the proposed KARMA approach.

Generic image for table
TABLE IV.

Formant tracking performance of KARMA, wavesurfer, and praat in terms of root-mean-square error (RMSE) taken per formant across all 516 utterances in the VTR database (Deng et al., 2006b). Reported RMSE (in Hz) is computed over speech-labeled frames and further categorized by 6 phonetic classes.

Generic image for table
TABLE V.

Formant tracking performance of KARMA, wavesurfer, and praat in terms of root-mean-square error (RMSE) taken per formant across all 516 utterances in the VTR database (Deng et al., 2006b). RMSE (in Hz) is reported over speech-labeled frames and further categorized by speaker gender (male, female).

Generic image for table
TABLE VI.

RMSE (in Hz) of KARMA, wavesurfer, and praat formant tracking of the first three formant trajectories in the VTRsynth database that resynthesizes utterances using a stochastic source.

Generic image for table
TABLE VII.

RMSE of KARMA, wavesurfer, and praat formant tracking of the first three formant trajectories in the VTRsynthf0 database that resynthesizes VTR database utterances using stochastic and periodic sources. RMSE (in Hz) is reported over speech-labeled frames and further categorized by original speaker gender (male, female) to reveal any fundamental frequency effects.

Loading

Article metrics loading...

/content/asa/journal/jasa/132/3/10.1121/1.4739462
2012-09-12
2014-04-16
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Kalman-based autoregressive moving average modeling and inference for formant and antiformant trackinga)
http://aip.metastore.ingenta.com/content/asa/journal/jasa/132/3/10.1121/1.4739462
10.1121/1.4739462
SEARCH_EXPAND_ITEM