Distributed speech-recognition-based speech reconstruction from MFCC vectors with fundamental frequency prediction.
Modeling of the joint MFCC and fundamental frequency feature space using (a) GMM clustering; (b) A series of GMMs, each located within the state of a set of HMMs.
Examples of prior voicing probabilities for the digits (a) six; (b) three, computed from the proportion of voiced vectors allocated to each state within the respective HMMs.
Comparison of the predicted fundamental frequency contour (solid line) and reference fundamental frequency contour (dashed line) for the utterance “nine six oh.” A value of zero indicates unvoiced speech or nonspeech.
Comparison of narrow-band spectrograms of the utterance “nine six oh” for (a) original speech signal; (b) reconstructed speech using the reference fundamental frequency; (c) reconstructed speech using the predicted fundamental frequency.
Classification accuracy and percentage fundamental frequency error for male speech.
Classification accuracy and percentage fundamental frequency error for female speech.
Article metrics loading...
Full text loading...