Skip to main content
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
1. Angkititrakul, P. , and Hansen, J. H. L. (2007). “ Discriminative in-set/out-of-set speaker recognition,” IEEE Trans. Audio Speech Lang. Process. 15, 498508.
2. Arsikere, H. , Leung, G. , Lulich, S. , and Alwan, A. (2012). “ Automatic height estimation using the second subglottal resonance,” in IEEE International Conference on Acoustics, Speech and Signal Processing 2012 (ICASSP), pp. 39893992.
3. Arsikere, H. , Leung, G. K. , Lulich, S. M. , and Alwan, A. (2013). “ Automatic estimation of the first three subglottal resonances from adults speech signals with application to speaker height estimation,” Speech Commun. 55, 5170.
28. Brestoff, J. , Perry, I. , and Van der Broeck, J. (2011). “ Challenging the role of social norms regarding body weight as an explanation for weight, height, and BMI misreporting biases: Development and application of a new approach to examining misreporting and misclassification bias in surveys,” BMC Public Health 11, 331341.
4. Crosmer, J. , and Barnwell, T. P. , I. (1985). “ A low bit rate segment vocoder based on line spectrum pairs,” in IEEE International Conference on Acoustics, Speech, and Signal Processing 1985 (ICASSP), Vol. 10, pp. 240243.
5.CRSS (2015). “ Training and test evaluation lists for height estimation,”∼hxb076000/HeightEstimation (Last viewed 05/29/2015).
6. Dusan, S. (2005). “ Estimation of speakers height and vocal tract length from speech signal,” in INTERSPEECH (ISCA), pp. 19891992.
7. Eide, E. , and Gish, H. (1996). “ A parametric approach to vocal tract length normalization,” in Proceedings of ICASSP 1996 ( IEEE Computer Society, Los Alamitos, CA), Vol. 1, pp. 346348.
8. Eyben, F. , Wollmer, M. , and Schuller, B. (2009). “ OpenEAR—introducing the Munich open-source emotion and affect recognition toolkit,” in 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, 2009 (ACII 2009), pp. 16.
9. Fitch, W. T. , and Giedd, J. (1999). “ Morphology and development of the human vocal tract: A study using magnetic resonance imaging,” J. Acoust. Soc. Am. 106, 15111522.
10. Ganchev, T. , Mporas, I. , and Fakotakis, N. (2010a). “ Audio features selection for automatic height estimation from speech,” in Lecture Notes in Computer Science. Artificial Intelligence: Theories, Models and Applications, edited by S. Konstantopoulos, S. Perantonis, V. Karkaletsis, C. Spyropoulos, and G. Vouros ( Springer, Berlin), Vol. 6040, pp. 8190.
11. Ganchev, T. , Mporas, I. , and Fakotakis, N. (2010b). “ Automatic height estimation from speech in real-world setup,” in Proceedings of EUSIPCO 2010, Aalborg, Denmark, pp. 800804.
12. Godin, K. W. , and Hansen, J. H. L. (2010). “ Session variability contrasts in the MARP corpus,” in INTERSPEECH (ISCA), pp. 298301.
13. Greenwood, A. R. , and Goodyear, C. C. (1994). “ A polynomial approximation to the acoustic-to-articulatory mapping,” in IEE Colloquium on Techniques for Speech Processing and their Application, pp. 8/18/6.
14. Greisbach, R. (1999). “ Estimation of speaker height from formant frequencies,” Forensic Ling. 6, 265277.
15. Hansen, J. H. L. (1988). “ Analysis and compensation of stressed and noisy speech with application to robust automatic recognition,” Ph.D. thesis, Georgia Institute of Technology, Atlanta, GA.
16. Hasan, T. , Sadjadi, O. , Gang, L. , Shokouhi, N. , Bořil, H. , and Hansen, J. H. L. (2013). “ CRSS systems for 2012 NIST speaker recognition evaluation,” in IEEE ICASSP 2013, Vancouver, Canada, pp. 67836787.
17. Itakura, F. (1975). “ Line spectrum representation of linear predictor coefficients of speech signals,” J. Acoust. Soc. Am. 57, S35.
18. Jain, A. K. , Dass, S. C. , and Nandakumar, K. (2004). “ Can soft biometric traits assist user recognition?,” in SPIE—Biometric Technology for Human Identification, Vol. 5404, pp. 561572.
19. Kent, R. D. , and Read, C. (1992). The Acoustic Analysis of Speech ( Whurr Publishers, San Diego), p. 22.
20. Kinnunen, T. , and Li, H. (2010). “ An overview of text-independent speaker recognition: From features to supervectors,” Speech Commun. 52, 1240.
21. Künzel, H. J. (1989). “ How well does average fundamental frequency correlate with speaker height and weight?,” Phonetica 46, 117125.
22. Lamel, L. F. , and luc Gauvain, J. (1995). “ A phone-based approach to non-linguistic speech feature identification,” Comput. Speech Lang. 9, 87103.
23. Lass, N. J. , and Brown, W. S. (1978). “ Correlational study of speakers' heights, weights, body surface areas, and speaking fundamental frequencies,” J. Acoust. Soc. Am. 63, 12181220.
24. Lee, L. , and Rose, R. (1996). “ Speaker normalization using efficient frequency warping procedures,” in Proeedings. of ICASSP'96 ( IEEE Computer Society, Los Alamitos, CA), Vol. 1, pp. 353356.
25. Mporas, I. , and Ganchev, T. (2009). “ Estimation of unknown speakers height from speech,” Int. J. Speech Technol. 12, 149160.
26.National Institute of Standards and Technology (1988). Getting Started with the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database (Gaithersburg, MD).
27. Pellom, B. L. , and Hansen, J. H. L. (1997). “ Voice analysis in adverse conditions: The Centennial Olympic Park Bombing 911 call,” in Proceedings of the 40th Midwest Symposium on Circuits and Systems 1997, Vol. 2, pp. 873876.
29. Rabiner, L. , and Schafer, R. (2011). “ Algorithms for estimating speech parameters,” in Theory and Applications of Digital Speech Processing, 1st ed. ( Pearson Higher Education, Upper Saddle River, NJ), pp. 548645.
30. Rendall, D. , Kollias, S. , Ney, C. , and Lloyd, P. (2005). “ Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: The role of vocalizer body size and voice-acoustic allometry,” J. Acoust. Soc. Am. 117, 944955.
31. Reynolds, D. A. (1995). “ Speaker identification and verification using Gaussian mixture speaker models,” Speech Commun. 17, 91108.
32. Schwarz, P. (2009). “ Phoneme recognition based on long temporal context,” Ph.D. thesis, Brno University of Technology, Czech Republic.
33. Smith, D. R. R. , Patterson, R. D. , Turner, R. , Kawahara, H. , and Irino, T. (2005). “ The processing and perception of size information in speech sounds,” J. Acoust. Soc. Am. 117, 305318.
34. van Dommelen, W. A. , and Moxness, B. H. (1995). “ Acoustic parameters in speaker height and weight identification: Sex-specific behaviour,” Lang. Speech 38, 267287.
35. Williams, K. , and Hansen, J. (2013). “ Speaker height estimation combining GMM and linear regression sub-systems,” in IEEE International Conference on Acoustics, Speech and Signal Processing 2013 (ICASSP), pp. 75527556.

Data & Media loading...


Article metrics loading...



Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition. In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented. The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus. Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario. The proposed algorithms achieve a highly competitive performance to previously published literature. Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 4.89 cm for males and 4.55 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts.


Full text loading...


Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd