No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
A linear model of acoustic-to-facial mapping: Model parameters, data set size, and generalization across speakers
1.Agelfors, E. , Beskow, J. , Granstrom, B. , Lundenber, M. , Salvi, G. , Spens, K. , and Ohman, R. (1999). “Synthetic visual speech driven from auditory speech,” Proceedings of Audio-Visual Speech Processing, Santa-Cruz, pp. 123–127.
2.Alfonso, P. J. , and Van Lieshout, P. (1997). “Spatial and temporal variability in gestural specification,” in Speech Production: Motor Control, Brain Research and Fluency Disorders, edited by W. Hulstijn, F. Peters, and P. van Lieshout (Elsevier Science, Amsterdam), pp. 151–160.
3.Atal, B. S. , Chang, J. J. , Mathews, M. V. , and Tukey, J. W. (1978). “Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer sorting technique,” J. Acoust. Soc. Am. 63, 1535–1555.
4.Barker, J. P. , and Berthommier, F. (1999). “Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models,” Proceedings of Audio-Visual Speech Processing, Santa-Cruz, pp. 112–117.
5.Berthommier, F. (2003). “Direct synthesis of video from speech sounds for new telecommunication applications,” in Smart Object Conference, Grenoble, France.
6.Brugnara, F. (2001). “Model agglomeration for context-dependent acoustic modeling,” Proceedings of Eurospeech, Aalborg, pp. 1641–1644.
7.Chou, B. H. , and Juang, W. (2003). Pattern Recognition in Speech and Language Processing (CRC Press, Boca Raton).
8.Craig, M. , Van Lieshout, P. , and Wong, W. (2007). “Suitability of a UV-based video recording system for the analysis of small facial motions during speech,” Speech Commun. 49, 679–686.
9.Curinga, S. , Lavaghetto, F. , and Vignoli, F. (1996). “Lip movements synthesis using time delay neural networks,” Proceedings of European Signal Processing Conference, Triest, pp. 999–1002.
10.Dodd, B. (1987). Hearing by Eye: The Psychology of Lip Reading (Lawrence Erlbaum Associates, Hillsdale, NJ).
11.Ferguson, G. A. (1984). Statistical Analysis, Synthesis, and Perception (Springer, New York).
12.Fletcher, J. , and Harrington, J. (1999). “Lip and jaw coarticulation,” in Coarticulation: Theory, Data and Techniques, edited by W. Hardcastle and N. Hewlett (Cambridge University Press, Cambridge), pp. 164–178.
13.Fu, S. , Gutierrez-Osuna, R. , Esposito, A. , Kakumanu, P. , and Garcia, O. (2005). “Audio/visual mapping with cross-modal hidden Markov models,” IEEE Trans. Multimedia 7, 243–252.
14.Hertrick, I. , and Ackermann, H. (2000). “Lip-jaw and tongue-jaw coordination during rate-controlled syllable repetitions,” J. Acoust. Soc. Am. 107, 2236–2246.
15.Hogden, J. , Rubin, P. , McDermott, E. , Katagiri, S. , and Goldstein, L. (2007). “Inverting mappings from smooth paths through Rn to paths through Rm: A technique applied to recovering articulation from acoustics,” Speech Commun. 49, 361–383.
16.Hong, P. , Wen, Z. , and Huang, T. S. (2002). “Real-time speech-driven face animation with expressions using neural networks,” IEEE Trans. Neural Netw. 13, 916–927.
17.Jiang, J. , Alwan, A. , Bernstein, L. , Keating, P. , and Auer, E. (2000a). “On the correlation between facial movements, tongue movements and speech acoustics,” Proceedings of the International Conference on Spoken Language Processing, Beijing, Vol. 1, pp. 42–45.
18.Jiang, T. , Li, Y. , and Chen, H. (2000b). “A vocoder based on LSP,” Proceedings of the 5th International Conference on Signal Processing, Beijing, pp. 697–701.
19.Jiang, J. , Alwan, A. , Bernstein, L. , Alwan, A. , and Keating, P. (2002). “Predicting face movements from speech acoustics using spectral dynamics,” Proceedings of the International Conference on Multimedia and Expo, Lausanne, pp. 181–184.
20.Kabal, P. (2003). “Time windows for linear prediction of speech,” Version 2. Technical report, Department Electrical and Computer Engineering, McGill University, Montreal.
21.Kakumanu, P. (2002). “Audio-video processing for speech driven facial animation,” M.Sc. thesis, Computer Science Department, Wright State University, Dayton, OH.
22.Kakumanu, P. , Esposito, A. , Garcia, O. , and Gutierrez-Osuna, R. (2006). “A comparison of acoustic coding models for speech-driven facial animation,” Speech Commun. 48, 598–615.
23.Kent, R. D. , and Read, C. (1992). The Acoustic Analysis of Speech (Singular Publishing Group, Inc., San Diego).
24.Logan, J. S. , Greene, B. G. , and Pisoni, D. B. (1989). “Segmental intelligibility of synthetic speech produced by rule,” J. Acoust. Soc. Am. 86, 566–581.
25.Massaro, D. W. , Beskow, J. , Cohen, M. M. , Fry, C. L. , and Rodriquez, T. (1999). “Picture my voice: Audio to visual speech synthesis using artificial neural networks,” Proceedings of Audio-Visual Speech Processing, Santa-Cruz, pp. 133–138.
26.Savran, A. , Arslan, L. , and Akarun, L. (2006). “Speaker-independent 3D face synthesis driven by speech and text,” Signal Process. 86, 2932–2951.
27.Schroeder, M. R. (1967). “Determination of the geometry of the human vocal tract by acoustic measurements,” J. Acoust. Soc. Am. 41, 1002–1010.
29.Summerfield, Q. (1992). “Lipreading and audio-visual speech perception,” Philos. Trans. R. Soc. London, Ser. B 335(1273), 71–78.
30.Xie, L. , and Liu, Z.-Q. (2007). “A coupled HMM approach to video-realistic speech animation,” Pattern Recogn. Lett. 40, 2325–2340.
31.Yamamoto, E. , Nakamura, S. , and Shikano, K. (1998). “Lip movement synthesis from speech based on hidden Markov models,” Speech Commun. 26, 105–115.
34.Zelezny, M. , Krnoul, Z. , Cisar, P. , and Matousek, J. (2006). “Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis,” Signal Process. 86, 3657–3673.
Article metrics loading...
Full text loading...
Most read this month