No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
Robust fundamental frequency estimation in sustained vowels: Detailed algorithmic comparisons and information fusion with adaptive Kalman filtering
1. M. Christensen and A. Jakobsson, Multi-pitch Estimation (Morgan and Claypool, San Rafael, CA, 2009), pp. 1–6.
2. I. R. Titze, Principles of Voice Production, 2nd ed. (National Center for Voice and Speech, Iowa City, 2000).
3. D. Talkin, “ A robust algorithm for pitch tracking,” in Speech Coding and Synthesis, edited by W. B. Kleijn and K. K. Paliwal (Elsevier Science, Philadelphia, 1995), Chap. 14, pp. 495–518.
5. V. Parsa and D. G. Jamieson, “ A comparison of high precision F0 extraction algorithms for sustained vowels,” J. Speech Lang. Hear. Res. 42, 112–126 (1999).
6. I. R. Titze and H. Liang, “ Comparison of F0 extraction methods for high-precision voice perturbation measurements,” J. Speech Hear. Res. 36, 1120–1133 (1993).
7. S. -J. Jang, S. -H. Choi, H. -M. Kim, H. -S. Choi, and Y. -R. Yoon, “ Evaluation of performance of several established pitch detection algorithms in pathological voices,” Proceedings of the 29th International Conference, IEEE EMBS, Lyon, France (2007), pp. 620–623.
8. C. Manfredi, A. Giordano, J. Schoentgen, S. Fraj, L. Bocchi, and P. H. Dejonckere, “ Perturbation measurements in highly irregular voice signals: Performance/validity of analysis software tools,” Biomed. Signal Process. Control 7, 409–416 (2012).
9. C. Ferrer, D. Torres, and M. E. Hernandez-Diaz, “ Using dynamic time warping of T0 contours in the evaluation of cycle-to-cycle pitch detection algorithms,” Pattern Recogn. Lett. 31, 517–522 (2010).
10. A. Tsanas, M. A. Little, P. E. McSharry, and L. O. Ramig, “ Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson's disease symptom severity,” J. R. Soc. Interface 8, 842–855 (2011).
11. A. Tsanas, M. A. Little, P. E. McSharry, J. Spielman, and L. O. Ramig: “ Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease,” IEEE Trans. Biomed. Eng. 59, 1264–1271 (2012).
12. A. Tsanas, M. A. Little, C. Fox, and L. O. Ramig, “ Objective automatic assessment of rehabilitative speech treatment in Parkinson's disease,” IEEE Trans. Neural Syst. Rehab. Eng. 22, 181–190 (2014).
13. A. Tsanas, M. A. Little, P. E. McSharry, and L. O. Ramig, “ New nonlinear markers and insights into speech signal degradation for effective tracking of Parkinson's disease symptom severity,” in International Symposium on Nonlinear Theory and its Applications (NOLTA), Krakow, Poland (2010), pp. 457–460.
14. J. I. Godino-Llorente, P. Gomez-Vilda, and M. Blanco-Velasco, “ Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters,” IEEE Trans. Biomed. Eng. 53, 1943–1953 (2006).
16. N. Henrich, C. d'Alessandro, B. Doval, and M. Castellengo, “ On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation,” J. Acoust. Soc. Am. 115, 1321–1332 (2004).
17. D. D. Mehta, M. Zañartu, T. F. Quatieri, D. D. Deliyski, and R. E. Hillman, “ Investigating acoustic correlates of human vocal fold phase asymmetry through mathematical modeling and laryngeal high-speed videoendoscopy,” J. Acoust. Soc. Am. 130, 3999–4009 (2011).
18. M. Zañartu, “ Acoustic coupling in phonation and its effect on inverse filtering of oral airflow and neck surface acceleration,” Ph.D. dissertation, School of Electrical and Computer Engineering, Purdue University (2010).
19. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. (Springer Science+Business Media, New York, 2009).
20. Q. Li, R. G. Mark, and G. D. Clifford, “ Robust heart rate estimation from multiple asynchronous noisy sources using signal quality indices and a Kalman filter,” Physiol. Meas. 29, 15–32 (2008).
21. B. H. Story and I. R. Titze, “ Voice simulation with a body-cover model of the vocal folds,” J. Acoust. Soc. Am. 97, 1249–1260 (1995).
23. I. R. Titze and B. H. Story, “ Rules for controlling low-dimensional vocal fold models with muscle activation,” J. Acoust. Soc. Am. 112, 1064–1076 (2002).
24. B. D. Erath, S. D. Peterson, M. Zañartu, G. R. Wodicka, and M. W. Plesniak, “ A theoretical model of the pressure distributions arising from asymmetric intraglottal flows applied to a two-mass model of the vocal folds,” J. Acoust. Soc. Am. 130, 389–403 (2011).
25. R. E. Hillman, E. B. Holmberg, J. S. Perkell, M. Walsh, and C. Vaughan, “ Objective assessment of vocal hyperfunction: An experimental framework and initial results,” J. Speech Hear. Res. 32, 373–392 (1989).
26. J. Kuo, “ Voice source modeling and analysis of speakers with vocal-fold nodules,” Ph.D. dissertation, Harvard–MIT Division of Health Sciences and Technology (1998).
27. B. H. Story, “ Physiologically-based speech simulation using an enhanced wave-reflection model of the vocal tract,” Ph.D. dissertation, University of Iowa (1995).
29. M. G. Christensen, “ On the estimation of low fundamental frequencies,” in Proceedings of the IEEE Workshop on Application of Signal Processes to Audio and Acoustics (2011), pp. 169–172.
30. M. R. P. Thomas and P. A. Naylor, “ The SIGMA algorithm: A glottal activity detector for electroglottographic signals,” IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009).
PRAAT: doing phonetics by computer (Version 5.1.15) [Computer program], by P. Boersma and D. Weenink. Retrieved from http://www.praat.org/
(Last viewed 3/21/2014).
33. P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “ Estimation of glottal closure instants in voices speech using the DYPSA algorithm,” IEEE Trans. Audio Speech Lang. Process. 15, 34–43 (2007).
34. P. Boersma, “ Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of sampled signal,” IFA Proc. 17, 97–110 (1993).
35. X. Sun, “ Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio,” ICASSP2002, Orlando, FL (2002).
36. A. Camacho and J. G. Harris, “ A sawtooth waveform inspired pitch estimator for speech and music,” J. Acoust. Soc. Am. 124, 1638–1652 (2008).
37. A. de Cheveigne and H. Kawahara, “ YIN, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Am. 111, 1917–1930 (2002).
38. H. Kawahara, H. Katayose, A. de Cheveigne, and R. D. Patterson, “ Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity,” Eurospeech, Budapest, Hungary (1999), pp. 2781–2784.
39. H. Kawahara, A. de Cheveigne, H. Banno, T. Takahashi, and T. Irino, “ Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT,” Interspeech, Lisbon, Portugal (2005), pp. 537–540.
40. H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno, “ Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation,” ICASSP 2008, Las Vegas (2008), pp. 3933–3936.
41. J. R. Raol, Multi-sensor Data Fusion with Matlab (CRC Press, Boca Raton, FL, 2010).
43. S. Nemati, A. Malhorta, and G. D. Clifford, “ Data fusion for improved respiration rate estimation,” EURASIP J. Adv. Signal Process. 2010, 926315 (2010).
44. M. A. Little, P. E. McSharry, I. M. Moroz, and S. J. Roberts, “ Testing the assumptions of linear prediction analysis in normal vowels,” J. Acoust. Soc. Am. 119, 549–558 (2007).
45. A. Tsanas, “ Accurate telemonitoring of Parkinson's disease symptom severity using nonlinear speech signal processing and statistical machine learning,” Ph.D. thesis, University of Oxford, UK (2012).
Article metrics loading...
There has been consistent interest among speech signal processing researchers in the accurate estimation of the fundamental frequency ( ) of speech signals. This study examines ten estimation algorithms (some well-established and some proposed more recently) to determine which of these algorithms is, on average, better able to estimate in the sustained vowel /a/. Moreover, a robust method for adaptively weighting the estimates of individual estimation algorithms based on quality and performance measures is proposed, using an adaptive Kalman filter (KF) framework. The accuracy of the algorithms is validated using (a) a database of 117 synthetic realistic phonations obtained using a sophisticated physiological model of speech production and (b) a database of 65 recordings of human phonations where the glottal cycles are calculated from electroglottograph signals. On average, the sawtooth waveform inspired pitch estimator and the nearly defect-free algorithms provided the best individual estimates, and the proposed KF approach resulted in a ∼16% improvement in accuracy over the best single estimation algorithm. These findings may be useful in speech signal processing applications where sustained vowels are used to assess vocal quality, when very accurate estimation is required.
Full text loading...
Most read this month