Index of content:
Volume 135, Issue 5, May 2014
- SPEECH PERCEPTION 
135(2014); http://dx.doi.org/10.1121/1.4870486View Description Hide Description
Twenty American English listeners identified gated fragments of all 2288 possible English within-word and cross-word diphones, providing a total of 538 560 phoneme categorizations. The results show orderly uptake of acoustic information in the signal and provide a view of where information about segments occurs in time. Information locus depends on each speech sound's identity and phonological features. Affricates and diphthongs have highly localized information so that listeners' perceptual accuracy rises during a confined time range. Stops and sonorants have more distributed and gradually appearing information. The identity and phonological features (e.g., vowel vs consonant) of the neighboring segment also influences when acoustic information about a segment is available. Stressed vowels are perceived significantly more accurately than unstressed vowels, but this effect is greater for lax vowels than for tense vowels or diphthongs. The dataset charts the availability of perceptual cues to segment identity across time for the full phoneme repertoire of English in all attested phonetic contexts.
135(2014); http://dx.doi.org/10.1121/1.4869088View Description Hide Description
Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures.
135(2014); http://dx.doi.org/10.1121/1.4870700View Description Hide Description
Reduced spectral resolution negatively impacts speech perception, particularly perception of vowels and consonant place. This study assessed impact of number of spectral channels on vowel discrimination by 6-month-old infants with normal hearing by comparing three listening conditions: Unprocessed speech, 32 channels, and 16 channels. Auditory stimuli (/ti/ and /ta/) were spectrally reduced using a noiseband vocoder and presented to infants with normal hearing via visual habituation. Results supported a significant effect of number of channels on vowel discrimination by 6-month-old infants. No differences emerged between unprocessed and 32-channel conditions in which infants looked longer during novel stimulus trials (i.e., discrimination). The 16-channel condition yielded a significantly different pattern: Infants demonstrated no significant difference in looking time to familiar vs novel stimulus trials, suggesting infants cannot discriminate /ti/ and /ta/ with only 16 channels. Results support effects of spectral resolution on vowel discrimination. Relative to published reports, young infants need more spectral detail than older children and adults to perceive spectrally degraded speech. Results have implications for development of perception by infants with hearing loss who receive auditory prostheses.
135(2014); http://dx.doi.org/10.1121/1.4870490View Description Hide Description
Previous research has shown that vocal errors can be simulated using a pitch perturbation technique. Two types of responses are observed when subjects are asked to ignore changes in pitch during a steady vowel production, a compensatory response countering the direction of the perceived change in pitch and a following response in the same direction as the pitch perturbation. The present study investigated the nature of these responses by asking subjects to volitionally change their voice fundamental frequency either in the opposite direction (“opposing” group) or the same direction (“following” group) as the pitch shifts (±100 cents, 1000 ms) presented during the speaker's production of an /a/ vowel. Results showed that voluntary responses that followed the stimulus directions had significantly shorter latencies (150 ms) than opposing responses (360 ms). In addition, prior to the slower voluntary opposing responses, there were short latency involuntary responses that followed the stimulus direction. These following responses may involve mechanisms of imitation or vocal shadowing of acoustical stimuli when subjects are predisposed to respond to a change in frequency of a sound. The slower opposing responses may represent a control strategy that requires monitoring and correcting for errors between the feedback signal and the intended vocal goal.