Skip to main content
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
1. D. H. Klatt, “ Review of text-to-speech conversion for English,” J. Acoust. Soc. Am. 82(3), 737793 (1987).
2. P. Alku, H. Tiitinen, and R. Naatanen, “ A method for generating natural-sounding speech stimuli for cognitive brain research,” Clin. Neurophysiol. 110, 13291333 (1999).
3. Y. Koike and J. Markel, “ Application of inverse filtering for detecting laryngeal pathology,” Ann. Otol. Rhinol. Laryngol. 84, 117124 (1975).
4. J. Gudnason and M. Brookes, “ Voice source cepstrum coefficients for speaker identification,” in Proceedings of ICASSP (2008), pp. 48214824.
5. M. D. Plumpe, T. F. Quatieri, and D. A. Reynolds, “ Modeling of the glottal flow derivative waveform with application to speaker identification,” IEEE Trans. Speech Audio Process. 7, 569586 (1999).
6. J. Wang and M. T. Johnson, “ Physiologically-motivated feature extraction for speaker identification,” in Proceedings of ICASSP (2014), pp. 16901694.
7. T. V. Ananthapadmanabha, “ Acoustic factors determining perceived voice quality,” in Vocal Fold Physiology: Voice Quality Control, edited by O. Fujimura and M. Hirano ( Singular Publishing Group, San Diego, CA, 1995) Chap. 7, pp. 113126.
8. G. Fant, J. Liljencrants, and Q. Lin, “ A four-parameter model of glottal flow,” Speech Trans. Lab. Q. Prog. Status Rep. 26, 113 (1985).
9. P. Alku, H. Strik, and E. Vilkman, “ Parabolic spectral parameter—a new method for quantification of the glottal flow,” Speech Commun. 22, 6779 (1997).
10. S. R. M. Prasanna, C. S. Gupta, and B. Yegnanarayana, “ Extraction of speaker-specific excitation information from linear prediction residual of speech,” Speech Commun. 48, 12431261 (2006).
11. D. Y. Wong, J. D. Markel, and A. H. Gray, Jr., “ Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Trans. Acoust., Speech. Signal Process. 27, 350355 (1979).
12. P. Alku, “ Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering,” Speech Commun. 11(2–3), 109118 (1992).
13. T. V. Ananthapadmanabha, “ Acoustic analysis of voice source dynamics,” Speech Trans. Lab. Q. Prog. Status Rep. 25(2–3), 124 (1984).
14. R. Muralishankar, A. G. Ramakrishnan, and P. Prathibha, “ Modification of pitch using DCT in the source domain,” Speech Commun. 42(2), 43154 (2004).
15. P. B. Pati and A. G. Ramakrishnan, “ Word level multi-script identification,” Pattern Recognit. Lett. 29, 12181229 (2008).
16. J. Gudnason, M. R. P. Thomas, D. P. W. Ellis, and P. A. Naylor, “ Data-driven voice source waveform analysis and synthesis,” Speech Commun. 54(2), 199211 (2012).
17. A. P. Prathosh, T. V. Ananthapadmanabha, and A. G. Ramakrishnan, “ Epoch extraction based on integrated linear prediction residual using plosion index,” IEEE Trans. Audio, Speech, Lang. Process. 21(12), 24712480 (2013).
18. T. V. Ananthapadmanabha, A. P. Prathosh, and A. G. Ramakrishnan, “ Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index,” J. Acoust. Soc. Am. 135(1), 460471 (2014).
19. D. G. Childers and C. Ahn, “ Modeling the glottal volume velocity waveform for three voice types,” J. Acoust. Soc. Am. 97(1), 505519 (1995).
20. D. A. Reynolds and R. C. Rose, “ Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio Process. 3(1), 7283 (1995).
21. W. Fisher, G. Doddington, and K. Goudie-Marshall, “ The DARPA speech recognition research database: Specifications and status,” in Proceedings of DARPA Workshop on Speech Recognition (1986), pp. 9399.
22. T. Drugman and T. Dutoit, “ The deterministic plus stochastic model of the residual signal and its applications,” IEEE Trans. Audio, Speech, Lang. Process. 20, 968981 (2012).
23. J. Campbell, “ Testing with the YOHO CD-ROM voice verification corpus,” in Proceedings of ICASSP (1995), pp. 341344.
24.NIST Multimodal Information Group, 2003 NIST Speaker Recognition Evaluation ( Linguistic Data Consortium, Philadelphia).
25. A. Hatch, S. Kajarekar, and A. Stolcke, “ Within-class covariance normalization for SVM-based speaker recognition,” in Proceedings of the International Conference on Spoken Language Processing (2006).

Data & Media loading...


Article metrics loading...



A characterization of the voice source (VS) signal by the pitch synchronous (PS) discrete cosine transform (DCT) is proposed. With the integrated linear prediction residual (ILPR) as the VS estimate, the PS DCT of the ILPR is evaluated as a feature vector for speaker identification (SID). On TIMIT and YOHO databases, using a Gaussian mixture model (GMM)-based classifier, it performs on par with existing VS-based features. On the NIST 2003 database, fusion with a GMM-based classifier using MFCC features improves the identification accuracy by 12% in absolute terms, proving that the proposed characterization has good promise as a feature for SID studies.


Full text loading...


Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd