A spectrogram of the word “socks” and its modulation spectrum. Modulation spectrum is obtained from a sequence of STFT vectors.
Normalized F0 statistics estimated for the MULTI-REG corpus (“Dict” and “Spon” are dictation and spontaneous speech, respectively) and the motherese corpus (“ID” and “AD” are infant-directed and adult-directed speech, respectively).
Difference of averaged modulation spectrum of infant-directed utterances from adult-directed utterances.
Difference of averaged modulation spectrum of dictation utterances from spontaneous utterances.
Spectrograms and their sections of “shoes” sound. In infant-directed ∕uw∕ sound, fundamental frequency aligns with the first formant.
Spectrograms and their sections of “sheep” sound. In infant-directed ∕iy∕ sound, fundamental frequency aligns with the first formant.
MDS representation of modulation spectrum features of speakers associated with dictated (Dict), spontaneous (Spon), infant-directed (ID), and adult-directed (AD) speech.
One-stage and two-stage training procedures.
Utterance features and associated classification error rates.
Sampling criteria and resulting WER. The baseline WER was 41.4.
WER for large vocabulary recognition using Decipher.
WER of one vs two-stage training using HTK, with and without update constraints.
WER of one vs two-stage training using Decipher.
Article metrics loading...
Full text loading...