banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion
  Download PDF
  Read Online HTML
1. J. Frankel and S. King, “ASR—articulatory speech recognition,” in Proceedings of Eurospeech, Scandinavia, (2001), pp. 599602.
2. L. Deng, G. Ramsay, and D. Sun, “Production models as a structural basis for automatic speech recognition,” Speech Commun. 22(2), 93112 (1997).
3. H. Attias, L. Lee, and L. Deng, “Variational inference and learning for segmental switching state space models of hidden speech dynamics,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, Vol. 1 (2003), pp. I872I875.
4. J. Ma and L. Deng, “Target-directed mixture dynamic models for spontaneous speech recognition,” IEEE Trans. Speech Audio Process. 12(1), 4758 (2004).
5. E. McDermott and A. Nakamura, “Production-oriented models for speech recognition,” IEICE Trans. Inf. Syst. E89-D(3), 10061014 (2006).
6. F. Metze and A. Waibel, “A flexible stream architecture for ASR using articulatory features,” International Conference on Spoken Language Processing, Denver, CO, USA (2002), pp. 21332136.
7. J. Hogden, A. Lofqvist, V. Gracco, I. Zlokarnik, P. Rubin, and E. Saltzman, “Accurate recovery of articulator positions from acoustics: New conclusions based on human data,” J. Acoust. Soc. Am. 100(3), 18191834 (1996).
8. H. Yehia, “A study on the speech acoustic-to-articulatory mapping using morphological constraints,” Ph.D. thesis, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Japan, 2002.
9. T. Toda, A. Black, K. Tokuda, “Statistical mapping between articulatory movements and acoustic spectrum using a gaussian mixture model,” Speech Commun. 50, 215217 (2008).
10. P. K. Ghosh and S. S. Narayanan, “A subject-independent acoustic-to-articulatory inversion,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Prague, Czech Republic (2011), pp. 46244627.
11. D. Yu, L. Deng, and A. Acero, “Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation,” Comput. Speech Lang. 27, 7287 (2007).
12. A. A. Wrench and H. J. William, “A multichannel articulatory database and its application for automatic speech recognition,” 5th Seminar on Speech Production: Models and Data, Bavaria (2000), pp. 305308.
13. J. Silva, V. Rangarajan, V. Rozgic, and S. S. Narayanan, “Information theoretic analysis of direct articulatory measurements for phonetic discrimination,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Honolulu, HI, USA (2007), pp. 457460.
14.For articulatory representation, one can also use raw X, Y values of the sensor positions of common articulators across subjects. TV features represent a “low-dimensional” (5 × 1) control regime for constriction actions in speech production and are considered more invariant in a linguistic sense.
15. C. P. Browman and L. Goldstein, “Towards an articulatory phonology,” Phonol. Yearbook 3, 219252 (1986).
16. John S. Garofolo et al., “TIMIT Acoustic-Phonetic Continuous Speech Corpus,” Linguistic Data Consortium, Philadelphia, PA (1993).
17. P. K. Ghosh and S. S. Narayanan, “A generalized smoothness criterion for acoustic-to- articulatory inversion,” J. Acoust. Soc. Am. 128(4), 21622172 (2010).
18. B. Pellom and K. Hacioglu, “Sonic: The University of Colorado continuous speech recognizer,” Technical Report No. TR-CSLR-2001-01 (2005).
19. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 138 (1977).
20.We did not explore delta and delta-delta MFCC as acoustic features due to the increase in feature dimension, which in turn requires more data for reliable estimates of GMM parameters; this is not afforded by the corpus limitations of the present study.
21. M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods (Wiley, New Jersey, 1999).

Data & Media loading...


Article metrics loading...



An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker’s speech acoustics using only an exemplary subject’s articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.


Full text loading...

This is a required field
Please enter a valid email address

Oops! This section does not exist...

Use the links on this page to find existing content.

752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion