Effect of spectral normalization on different talker speech recognition by cochlear implant users
Implementation framework of the GMM-based spectral normalization algorithm.
Normalized talker distortion as a function of number of channels. Solid line: Without spectral normalization. Dashed line: With spectral normalization. Note that the talker distortion between talkers F1 and M1 (unprocessed speech) was used as the reference.
Individual and mean sentence recognition performance for talkers M1 and F1. For subjects S1–S3, performance with F1 was better than that with M1; for subjects S4–S9, performance was better with M1 than with F1. The error bars show , and the asterisks show significantly different performance between the two talkers .
Wave forms for the sentence “Glue the sheet to the dark blue background.” Top panel: Pitch-shift transformation T0.6 (upward pitch shift). Middle panel: Reference talker T1.0 (unprocessed speech from talker F1). Bottom panel: Pitch-shift transformation T1.6 (downward pitch shift).
Spectral envelopes for different processing conditions in Experiment 2. Top panel: Spectral envelopes for reference talker T1.0 and pitch-shift transformations T0.6 and T1.6. Bottom panel: Spectral envelopes for T1.0 and spectral transformations T0.6-to-T1.0 and T1.6-to-T1.0.
NH subjects’ overall speech quality ratings for the pitch-shift transformations, with (open symbols) and without (closed symbols) spectral normalization. The error bars show , and the asterisks indicate significantly different ratings with spectral normalization . Note that source talker T1.0 (unprocessed speech from talker F1) was used to anchor the subjective quality ratings.
Sentence recognition performance for NH and CI subjects, with (open symbols) and without (closed symbols) spectral transformation, as a function of pitch-shift transformations. The error bars show , and the asterisks indicate significantly different performance after spectral transformation .
Subject demographics for the cochlear implant patients who participated in the present study.
Performance difference between unprocessed source talkers (i.e., M1 vs F1), and between spectrally normalized and unprocessed talkers. Note that because the performance with talkers M1 and F1 differed among individual subjects, comparisons are made in terms of the “Better” and “Worse” talker. Bold numbers indicate significant differences in performance across different sentence lists .
Pitch and formant analysis for the pitch-shift and spectral transformations in Experiment 2. The target F0 for the pitch-shift transformations was scaled according to the pitch-stretching ratio used for processing; the target F0 for the spectral transformation refers to the measured F0 values after pitch-stretching. The F0s and formant frequencies were measured with software WAVESURFER 1.8.5. The F0s were averaged across all IEEE sentences. The formant frequencies were estimated for the vowel /ɪʏ/ from the sentence “Glue the sheet to the dark blue background.” Note that reference talker T1.0 (in bold) was F1 from Experiment 1.
significance values for linear regressions performed between the unprocessed talkers from Experiment 1 (M1 and F1) and the pitch-shift transformations from Experiment 2 (T0.6, T0.8, T1.2, T1.4, T1.6).
Article metrics loading...
Full text loading...