1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Characteristics of speaking style and implications for speech recognition
Rent:
Rent this article for
USD
10.1121/1.3183593
/content/asa/journal/jasa/126/3/10.1121/1.3183593
http://aip.metastore.ingenta.com/content/asa/journal/jasa/126/3/10.1121/1.3183593

Figures

Image of FIG. 1.
FIG. 1.

A spectrogram of the word “socks” and its modulation spectrum. Modulation spectrum is obtained from a sequence of STFT vectors.

Image of FIG. 2.
FIG. 2.

Normalized F0 statistics estimated for the MULTI-REG corpus (“Dict” and “Spon” are dictation and spontaneous speech, respectively) and the motherese corpus (“ID” and “AD” are infant-directed and adult-directed speech, respectively).

Image of FIG. 3.
FIG. 3.

Difference of averaged modulation spectrum of infant-directed utterances from adult-directed utterances.

Image of FIG. 4.
FIG. 4.

Difference of averaged modulation spectrum of dictation utterances from spontaneous utterances.

Image of FIG. 5.
FIG. 5.

Spectrograms and their sections of “shoes” sound. In infant-directed ∕uw∕ sound, fundamental frequency aligns with the first formant.

Image of FIG. 6.
FIG. 6.

Spectrograms and their sections of “sheep” sound. In infant-directed ∕iy∕ sound, fundamental frequency aligns with the first formant.

Image of FIG. 7.
FIG. 7.

MDS representation of modulation spectrum features of speakers associated with dictated (Dict), spontaneous (Spon), infant-directed (ID), and adult-directed (AD) speech.

Image of FIG. 8.
FIG. 8.

One-stage and two-stage training procedures.

Tables

Generic image for table
TABLE I.

Utterance features and associated classification error rates.

Generic image for table
TABLE II.

Sampling criteria and resulting WER. The baseline WER was 41.4.

Generic image for table
TABLE III.

WER for large vocabulary recognition using Decipher.

Generic image for table
TABLE IV.

WER of one vs two-stage training using HTK, with and without update constraints.

Generic image for table
TABLE V.

WER of one vs two-stage training using Decipher.

Loading

Article metrics loading...

/content/asa/journal/jasa/126/3/10.1121/1.3183593
2009-09-09
2014-04-18
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Characteristics of speaking style and implications for speech recognition
http://aip.metastore.ingenta.com/content/asa/journal/jasa/126/3/10.1121/1.3183593
10.1121/1.3183593
SEARCH_EXPAND_ITEM