Skip to main content
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
The full text of this article is not currently available.
/content/asa/journal/jasa/137/6/10.1121/1.4921679
1.
1. D. H. Klatt, “ Review of text-to-speech conversion for English,” J. Acoust. Soc. Am. 82(3), 737793 (1987).
http://dx.doi.org/10.1121/1.395275
2.
2. P. Alku, H. Tiitinen, and R. Naatanen, “ A method for generating natural-sounding speech stimuli for cognitive brain research,” Clin. Neurophysiol. 110, 13291333 (1999).
http://dx.doi.org/10.1016/S1388-2457(99)00088-7
3.
3. Y. Koike and J. Markel, “ Application of inverse filtering for detecting laryngeal pathology,” Ann. Otol. Rhinol. Laryngol. 84, 117124 (1975).
http://dx.doi.org/10.1177/000348947508400118
4.
4. J. Gudnason and M. Brookes, “ Voice source cepstrum coefficients for speaker identification,” in Proceedings of ICASSP (2008), pp. 48214824.
5.
5. M. D. Plumpe, T. F. Quatieri, and D. A. Reynolds, “ Modeling of the glottal flow derivative waveform with application to speaker identification,” IEEE Trans. Speech Audio Process. 7, 569586 (1999).
http://dx.doi.org/10.1109/89.784109
6.
6. J. Wang and M. T. Johnson, “ Physiologically-motivated feature extraction for speaker identification,” in Proceedings of ICASSP (2014), pp. 16901694.
7.
7. T. V. Ananthapadmanabha, “ Acoustic factors determining perceived voice quality,” in Vocal Fold Physiology: Voice Quality Control, edited by O. Fujimura and M. Hirano ( Singular Publishing Group, San Diego, CA, 1995) Chap. 7, pp. 113126.
8.
8. G. Fant, J. Liljencrants, and Q. Lin, “ A four-parameter model of glottal flow,” Speech Trans. Lab. Q. Prog. Status Rep. 26, 113 (1985).
9.
9. P. Alku, H. Strik, and E. Vilkman, “ Parabolic spectral parameter—a new method for quantification of the glottal flow,” Speech Commun. 22, 6779 (1997).
http://dx.doi.org/10.1016/S0167-6393(97)00020-4
10.
10. S. R. M. Prasanna, C. S. Gupta, and B. Yegnanarayana, “ Extraction of speaker-specific excitation information from linear prediction residual of speech,” Speech Commun. 48, 12431261 (2006).
http://dx.doi.org/10.1016/j.specom.2006.06.002
11.
11. D. Y. Wong, J. D. Markel, and A. H. Gray, Jr., “ Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Trans. Acoust., Speech. Signal Process. 27, 350355 (1979).
http://dx.doi.org/10.1109/TASSP.1979.1163260
12.
12. P. Alku, “ Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering,” Speech Commun. 11(2–3), 109118 (1992).
http://dx.doi.org/10.1016/0167-6393(92)90005-R
13.
13. T. V. Ananthapadmanabha, “ Acoustic analysis of voice source dynamics,” Speech Trans. Lab. Q. Prog. Status Rep. 25(2–3), 124 (1984).
14.
14. R. Muralishankar, A. G. Ramakrishnan, and P. Prathibha, “ Modification of pitch using DCT in the source domain,” Speech Commun. 42(2), 43154 (2004).
http://dx.doi.org/10.1016/j.specom.2003.05.001
15.
15. P. B. Pati and A. G. Ramakrishnan, “ Word level multi-script identification,” Pattern Recognit. Lett. 29, 12181229 (2008).
http://dx.doi.org/10.1016/j.patrec.2008.01.027
16.
16. J. Gudnason, M. R. P. Thomas, D. P. W. Ellis, and P. A. Naylor, “ Data-driven voice source waveform analysis and synthesis,” Speech Commun. 54(2), 199211 (2012).
http://dx.doi.org/10.1016/j.specom.2011.08.003
17.
17. A. P. Prathosh, T. V. Ananthapadmanabha, and A. G. Ramakrishnan, “ Epoch extraction based on integrated linear prediction residual using plosion index,” IEEE Trans. Audio, Speech, Lang. Process. 21(12), 24712480 (2013).
http://dx.doi.org/10.1109/TASL.2013.2273717
18.
18. T. V. Ananthapadmanabha, A. P. Prathosh, and A. G. Ramakrishnan, “ Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index,” J. Acoust. Soc. Am. 135(1), 460471 (2014).
http://dx.doi.org/10.1121/1.4836055
19.
19. D. G. Childers and C. Ahn, “ Modeling the glottal volume velocity waveform for three voice types,” J. Acoust. Soc. Am. 97(1), 505519 (1995).
http://dx.doi.org/10.1121/1.412276
20.
20. D. A. Reynolds and R. C. Rose, “ Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio Process. 3(1), 7283 (1995).
http://dx.doi.org/10.1109/89.365379
21.
21. W. Fisher, G. Doddington, and K. Goudie-Marshall, “ The DARPA speech recognition research database: Specifications and status,” in Proceedings of DARPA Workshop on Speech Recognition (1986), pp. 9399.
22.
22. T. Drugman and T. Dutoit, “ The deterministic plus stochastic model of the residual signal and its applications,” IEEE Trans. Audio, Speech, Lang. Process. 20, 968981 (2012).
http://dx.doi.org/10.1109/TASL.2011.2169787
23.
23. J. Campbell, “ Testing with the YOHO CD-ROM voice verification corpus,” in Proceedings of ICASSP (1995), pp. 341344.
24.
24.NIST Multimodal Information Group, 2003 NIST Speaker Recognition Evaluation ( Linguistic Data Consortium, Philadelphia).
25.
25. A. Hatch, S. Kajarekar, and A. Stolcke, “ Within-class covariance normalization for SVM-based speaker recognition,” in Proceedings of the International Conference on Spoken Language Processing (2006).
http://aip.metastore.ingenta.com/content/asa/journal/jasa/137/6/10.1121/1.4921679
Loading
/content/asa/journal/jasa/137/6/10.1121/1.4921679
Loading

Data & Media loading...

Loading

Article metrics loading...

/content/asa/journal/jasa/137/6/10.1121/1.4921679
2015-05-28
2016-09-25

Abstract

A characterization of the voice source (VS) signal by the pitch synchronous (PS) discrete cosine transform (DCT) is proposed. With the integrated linear prediction residual (ILPR) as the VS estimate, the PS DCT of the ILPR is evaluated as a feature vector for speaker identification (SID). On TIMIT and YOHO databases, using a Gaussian mixture model (GMM)-based classifier, it performs on par with existing VS-based features. On the NIST 2003 database, fusion with a GMM-based classifier using MFCC features improves the identification accuracy by 12% in absolute terms, proving that the proposed characterization has good promise as a feature for SID studies.

Loading

Full text loading...

/deliver/fulltext/asa/journal/jasa/137/6/1.4921679.html;jsessionid=PIceOIHUBdWQZ6yyiUjqT0Sj.x-aip-live-02?itemId=/content/asa/journal/jasa/137/6/10.1121/1.4921679&mimeType=html&fmt=ahah&containerItemId=content/asa/journal/jasa
true
true

Access Key

  • FFree Content
  • OAOpen Access Content
  • SSubscribed Content
  • TFree Trial Content
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
/content/realmedia?fmt=ahah&adPositionList=
&advertTargetUrl=//oascentral.aip.org/RealMedia/ads/&sitePageValue=asadl.org/jasa/137/6/10.1121/1.4921679&pageURL=http://scitation.aip.org/content/asa/journal/jasa/137/6/10.1121/1.4921679'
Right1,Right2,Right3,