Index of content:
Volume 63, Issue S1, May 1978
- PROGRAM OF THE 95TH MEETING OF THE ACOUSTICAL SOCIETY OF AMERICA
- Session A. Psychological and Physiological Acoustics I: Symposium on Non‐Simultaneous Techniques in Physiological and Psychological Measurement
- Invited Papers
Microsurgery of the ear: The Robinson stainless steel stapes prosthesis to restore hearing in otosclerosis63(1978); http://dx.doi.org/10.1121/1.2016537View Description Hide Description
The development and surgical use of the Robinson stainless steel stapes prosthesis for the treatment of hearing loss in otosclerosis will be presented. Over 4000 operations have been performed by the author to date with a success rate of 97%. The anatomy,physiology, and surgical pathology will be presented on color slides, as well as the surgical technique and statistical results. A color motion picture will illustrate the actual microsurgery as observed through the operating microscope. This technique is now one of the most frequently used stapedectomy techniques in the country, with over 10 000 surgical operations performed each year using this prosthesis.
63(1978); http://dx.doi.org/10.1121/1.2016538View Description Hide Description
Traditionally, auditory spectral resolution is studied psychophysically by masking experiments (i.e., the effect of a masking stimulus on the detection threshold of a test tone with variable frequency). During the last decade there has been increasing experimental evidence of an essential qualitative difference between the results of two classes of such experiments: (a) masking stimulus and test tone being presented simultaneously (direct masking) or (b) nonsimultaneously (forward masking, pulsation threshold). This difference can be interpreted as an effect of lateral suppression (spectral sharpening) which manifests itself only in case of nonsimultaneous presentation of masking stimulus and test tone. An overview of the relevant data is presented (e.g., auditory Mach bands, two‐tone suppression, increased ripple resolution, etc.), with occasional references to related electrophysiological data. The experimental data lead to the notion of a peripheral stage of spectral sharpening of such a nature that it does not affect the detection threshold of a test tone in direct masking.
63(1978); http://dx.doi.org/10.1121/1.2016539View Description Hide Description
The following equation summarizes many of the poststimulatory and perstimulatory effects of a short‐term conditioning tone on the response to a brief test tone: R(i,I,T,t) = F(i) − [F(I) − SP]G(T,t). R represents the firing rate in response to the test tone; i and I, the sound intensities of the test and conditioning tones, respectively; F, the unconditioned or unadapted rate‐intensity function; and SP, spontaneous activity. The “relative decrement” G depends on both the duration of the conditioning tone T, and the silent interval between the conditioning and test tones, t. In the perstimulatory paradigm t = 0, R represents the total response during the test interval, and i, the total intensity. According to the equation, the response to the test tone equals the unconditioned response minus the decrement produced by the conditioning tone. Furthermore, the decrement is proportional to the driven response to the conditioning tone and does not depend on sound intensity per se. The experimental results leading to these conclusions will be reviewed, as will their physiological and psychophysical implications.
63(1978); http://dx.doi.org/10.1121/1.2016588View Description Hide Description
Response patterns of single auditory‐nerve fibers are compared under two conditions. The first condition is similar to the stimulus condition customarily used to determine pulsation threshold psychophysically. A 100‐ms duration signal at a fiber's characteristic frequency (CF) is presented at 10 dB above threshold at a rate of 5/s. A 100‐ms duration tonal masker is introduced during the interval between signal pulses, raised in intensity and PST response patterns collected at each masker level. The second condition is the same except that the signal is continuous rather than pulsed. In the pulsed‐signal condition when the masker‐evoked discharge rate equals the signal‐evoked discharge rate, the response pattern resembles the firing pattern to a continuous tone. When the masker‐evoked rate exceeds the signal‐evoked rate, the response pattern is indistinguishable from the pattern evoked by a continuous signal plus a pulsed masker. For different masker frequencies the results are essentially the same when masker level is expressed relative to the fiber's excitatory response threshold at the frequency of the masking stimulus. It is suggested that “pulsation threshold” is an auditory illusion in which the central processor is presented with ambiguous information and accepts as a “perceptual hypothesis” the more likely possibility that a continuous signal is superimposed on a pulsed masker. Data presented previously on forward masking and unmasking [J. Acoust. Soc. Am. 62, S45–S46 (A) (1977)] will be reviewed and compared to the pulsation threshold data.
63(1978); http://dx.doi.org/10.1121/1.2016589View Description Hide Description
The nonsimultaneous masking of a sinusoidal signal (2 kHz) produced by a critical‐band noise (200 Hz wide centered at 2 kHz) is reduced by the presence of a suppressor (bandpass noise, 2300–3700 Hz) during the masker interval. The results of various experiments suggest: (1) Suppression is greater in backward masking than in forward masking. (2) The effect of masker intensity on suppression differs in backward and forward masking. (3) If the suppressor is presented to the contralateral ear, suppression is observed in backward masking but not in forward masking. These experimental differences may reflect a fundamental difference in the processes underlying backward and forward masking. (4) The amount of masking decreases with an increase in the duration of the suppressor. It also decreases with an increase in the duration of a reduction in the intensity of the excitor presented alone. However, this change in masking is markedly different for the two conditions; the effect of the suppressor is not simply to reduce the effective level of the excitor. [Research supported by NIH.]
63(1978); http://dx.doi.org/10.1121/1.2016590View Description Hide Description
Data from masking experiments have long been used to draw inferences about the properties of the mechanisms which mediate auditory spectral resolution or the status of those mechanisms in persons with a hearing impairment. It has recently become apparent that because of rather complex, nonlinear interactions between signal and masker (e.g., that result in distortion products or “suppression”), data from the classical simultaneous‐masking experiments must be assumed to reflect the influence of several mechanisms in addition to those which mediate spectral resolution. This problem is particularly acute in experiments with hearing‐impaired listeners, because of the possibility that any one of those mechanisms (or none) may be impaired, and because of the great variability in audiometric configuration encountered in this population. We suggest that forward masking may be a more useful way to assess spectral resolution, particularly in hearing‐impaired listeners. As evidence for this, forward‐masked psychophysical tuning curves obtained from hearing‐impaired listeners are quite reasonable in shape (though much broader than normal) and show none of the rather bizarre irregularities observed in simultaneous‐masked tuning curves.
63(1978); http://dx.doi.org/10.1121/1.2016633View Description Hide Description
Masking of transient and tonal signals was studied in both diotic and dichotic temporal listening conditions. Temporal conditions of forward, backward, and combined forward‐backward masking were used. Some of the variables which we have studied are the temporal separation between maskers and signal, the spectral differences between maskers and signal, and the type of dichotic stimulus configuration. Masking in dichotic conditions appears to parallel that obtained in diotic conditions when the forward‐ and backward‐masking procedures are used. Substantial differences exist between diotic and dichotic masking in the combined forward‐backward‐masking procedures. A summary of these results will be provided in order to describe some of the basic relationships that exist in binaural masking when the signal and masker are not presented simultaneously. [Work supported by NSF and NIH.]
63(1978); http://dx.doi.org/10.1121/1.2016634View Description Hide Description
The two‐tone suppression effect in forward masking may be equivalent to a simple attenuation of the level of the suppressed tone. To test this we equated two maskers by determining the level of a 1‐kHz tone required to mask a 1‐kHz probe at 30 dB SL for the 1‐kHz tone (a) alone (level L 1), (b) together with a 1150‐Hz tone at L 1 + 20 dB. According to the simple attenuation hypothesis the 1‐kHz components in these two maskers should be equivalent in their internal representation and thus equally effective as maskers. Accordingly, the probe threshold was measured as a function of probe frequency for each of the two maskers. Contrary to the predictions probe threshold was higher for the second masker when the probe frequency was less than 980 Hz. However this discrepancy may be explained by the combination tone at 850 Hz produced by the two‐tone masker. An adaptive 2‐AFC procedure was used throughout, with 20‐ms probe tones and 300‐ms maskers.
- Session B. Speech Communication I: Vowel Perception
- Contributed Papers
63(1978); http://dx.doi.org/10.1121/1.2016635View Description Hide Description
The perceptual locus of anchoring effects in vowel identification was investigated using a series of synthetic, steady‐state vowels ranging from [i] to [I]. In the control conditions, each stimulus occurred equally often. In the anchoring conditions, one of the end points occurred more often than the other stimuli. The category boundary in the anchoring condition shifted toward the more frequently occurring stimulus, relative to the control. A signal detection analysis was used to separate the effects of sensitivity from those of response bias. For the [i] anchor, each subject showed a decrease in sensitivity for the stimulus pair at the category boundary following anchoring. For the [I] anchor group, an increase in sensitivity at the category boundary was found. Changes in criterion placement were inconsistent across subjects. These data indicate that anchoring effects occur at a relatively early stage of vowel processing. Implications of the data for the nature of vowel processing will be discussed. [Research supported by University Funds and the SUNY Research Foundation.]
63(1978); http://dx.doi.org/10.1121/1.2016636View Description Hide Description
Synthetic steady‐state vowels from an /i/‐/I/‐/ / continuum were presented in an AX discrimination task. The interval between the stimuli in a pair was either short (240 ms) or long (1920 ms), and either empty or filled with irrelevant vowelsounds. Discrimination performance was disturbed by both these manipulations, suggesting that auditory memory is subject to decay as well as to interference. In a second experiment, we predicted discrimination performance at short‐unfilled and long‐filled intervals from identification response to the same AX pairs. While obtained discrimination performance exceeded the predictions, this difference was equally small at the two interstimulus intervals. Identification was as much affected by interstimulus interval as discrimination: There were large contrast effects at the short‐unfilled interval, but not at the long‐filled interval. Implications of these results for models of categorical perception will be discussed. [Work supported by NICHD.]
63(1978); http://dx.doi.org/10.1121/1.2016685View Description Hide Description
Identification and discrimination experiments using series of isolated vowels have shown that these vowels, unlike many consonant series, are not perceived categorically. Recent studies, however, suggest that naturally spoken isolated vowels are identified less reliably than vowels in context. In this study three series of synthetic three‐formant syllables varying in ten steps from /I/ to /ε/ or /dId/to /dεd/ were constructed to investigate the following questions: (1) Are vowels in syllable context perceived in a more categorical manner than isolated vowels of similar length? (2) Are isolated vowels of short duration perceived in a more categorical manner than vowels of longer duration? (3) To what extent are the identification and discrimination of isolated vowels of short duration comparable to that of vowels in syllable context? Perception was investigated using standard identification and oddity discrimination procedures. Further investigations are planned using unbiased measures of discrimination. [Supported by grants from NICHD and NIMH.]
63(1978); http://dx.doi.org/10.1121/1.2016686View Description Hide Description
Quasi‐steady‐state vowels which attain representative formant frequency values for adult male speakers of General American might be regarded as good approximations to canonical form. Results of experiments on vowel perception do not support this assumption. Two speakers produced tokens of nine different vowels in isolation. Separate listening tests were made by random orders of six tokens of each vowel. A second pair of tests was created by digitizing the speech signals, abstracting a single pitch pulse (at the zero crossings) from each vowel center, and iterating to produce sets of pseudovowels matched in duration to the parent syllables. Listeners' judgments contained nearly twice as many errors for iterated pseudovowels than for the unedited natural versions. A third experiment with OVE‐synthesized vowels yielded error rates comparable to those for iterated pseudovowels, not to their natural counterparts. We conclude that natural vowels contain sources of information not captured by target formant frequency values. A fourth study created stylized CVC syllables by adding symmetrical linear formant transitions to the set of OVE‐synthesized vowels. Results of listening tests demonstrate a significant gain in vowel identification for OVE CVC vowels in comparison to their ♯V♯ counterparts. These results demonstrate that formant transitions play a major role in vowel perception. [This work was supported by NIH Grant HD01994 to Haskins Laboratories.]
63(1978); http://dx.doi.org/10.1121/1.2016687View Description Hide Description
Consonantal environment may aid in specifying vowel identity by supplying critical information about timing. Several vowel pairs in American English are distinguished by temporal as well as spectral variables, and these temporal differentia vary with articulatory rate. Two studies were designed to explore the following paradox: When consonantal formant transitions are introduced into a steady‐state vowel, holding syllable duration constant, a response shift is observed toward longer‐vowel alternatives, even though steady‐state duration has been reduced. The first study verified this finding for the vowel pair /ε/‐/æ/ in comparisons of ♯V♯ and bVb continua. Pairs of continua were defined separately by F1 variation and by duration variation, and each continuum type evidenced the paradox. A second study varied the rate of symmetric consonantal transitions in F1‐varying CVC continua (V = /ε, æ/, C = ♯, b, w/) in order to test whether transition rate might specify an articulatory rate that effectively scales vowel duration. Vowel responses did not vary monotonically with either transition rate or steady‐state duration, but interacted with the perceived identity of the initial consonant. Listeners' judgments may demonstrate a sensitivity to constraints on the relative timing of consonantal and vocalic gestures. [This work was supported by NIH grant HD‐01994 to Haskins Laboratories, and by the University of Michigan Society of Fellows.]
63(1978); http://dx.doi.org/10.1121/1.2016688View Description Hide Description
Strange, Verbrugge, Schankweiler, and Edman [J. Acoust. Soc. Am. 60, 213–224 (1976)] report that vowels are better identified in the context of a CVC than in isolation. The present study investigates the basis for this difference in performance. One basis might be auditory. In contrast to isolated vowels, CVCs are dynamic acoustic events. If the auditory system is more sensitive to dynamic patterns of stimulation than to unchanging signals, the CVC advantage may follow from that. A different account derives from the various proposals that perceivers of speech are sensitive to the acoustic patterning in a speech signal that specifies the vocal tract gestures of the talker. CVCs may better specify their articulatory source than isolated vowels. These alternative classes of explanation are distinguished here by separating the properties of acoustic change and articulatory specification. Three sets of listening tests were devised. The first consisted of nine different isolated vowels presented six times each in random order. The second presented the same nine vowels in a /b‐b/ context. The third test consisted of the same nine vowels in the context of formant transitions constructed by mirror imaging the /b/ transitions with respect to the steady‐state formants for the vowel. The resulting acoustic patterns include as much acoustic change as the bVb syllables but are not patterns that a vocal tract could produce. Identification of vowels embedded in the mirror‐imaged transitions was substantially worse than that of vowels in isolation or in the context of a CVC. We interpret these findings as supportive of an “articulatory” as opposed to an “auditory” account of the CVC advantage. [This work was supported by NIH Grant HD0‐1994 Haskins Laboratories.]
63(1978); http://dx.doi.org/10.1121/1.2016689View Description Hide Description
Previous research in our laboratory showed that vowels spoken in labial consonant contexts (/p‐p/, /b‐b/) were identified significantly better than isolated vowels. The present study investigates whether initial and/or final velar consonants also aid vowel identification. We compared identification of ten American English vowels in nine conditions: /k/‐vowel‐/k/, /k/‐vowel, vowel‐/k/, /g/‐vowel‐/g/, /g/‐vowel, vowel‐/g/, /p/‐vowel‐/p/, /b/‐vowel‐/b/ and isolated vowels.Vowels in /k‐k/, /k‐/, /‐k/, /p‐p/, and /b‐b/ frames were identified much more accurately than isolated vowels. However, error rates for vowels in /g‐g/, /g‐/, and /‐g/ frames did not differ significantly from isolated vowels. Error rates for vowels in /k‐k/, /k‐/, and /‐k/ were not significantly different from each other, nor were errors for vowels in /g‐g/, /g‐/, and /‐g/ frames. This seems to disconfirm a phonological hypothesis to explain the poorer identification of isolated vowels [T. R. Edman, W. Strange, and J. J. Jenkins, J. Acoust. Soc. Am. 59, S25(A) (1976)]. Possible reasons for the ineffectiveness of /g/ contexts to aid vowel identification are discussed. [Supported by NIMH, NICHD.]
63(1978); http://dx.doi.org/10.1121/1.2016737View Description Hide Description
Research reported last year [W. Strange, J. J. Jenkins, and T. R. Edman, J. Acoust. Soc. Am. 61, S1 (1977)] was extended to investigate the contribution of dynamic spectral and temporal information to the specification of vowels. Two tokens each of ten American English vowels produced in /b/‐vowel‐/b/ syllables by four speakers were digitized and several stimulus conditions were constructed: (1) silent‐center syllables (in which a variable‐duration center “vowel” portion of each syllable was deleted). (2) variable‐duration centers (with initial and final transitions removed). (3) fixed‐duration centers (52 ms), and (4) long silent centers, and short silent centers (in which the temporal relations of initial and final portions were changed). Vowel identification was best for the silent‐center syllables produced by three of four speakers. Vowels in the variable‐duration center condition were identified better than in the fixed‐duration center condition. Lengthening the silent centers increased errors while shortening did not. Effects of speaking rate, vocal tract differences, and dialect are discussed. [Supported by NIMH.]
63(1978); http://dx.doi.org/10.1121/1.2016738View Description Hide Description
The stimuli were spoken vowel‐constant‐vowel sequences in the sentence frame “I say V1 C V2” in which V1 was /a, i/, C was /p,t,k/ and V2 was /u,æ/ in all 12 combinations. Reaction time (RT) of practiced listeners was observed to the assigned final stressed vowel target (V2). Sentence pairs differing only in final target vowel were either (a) cross spliced at the pre‐stop‐consonant silent interval or (b) left intact. RT was slower to both targets when cross spliced than to their intact counterparts. However, the results were asymmetrical: one target was affected by cross splicing more than the other. This suggests that cross‐splice effects were not due to prosodic discontinuity alone, since such discontinuity presumably applies to both sentences equally when crossed. The asymmetry between targets suggests either different amounts of target‐specific information preceding the silent interval or a difference in perceptual weight given to information on the two sides of the interval.
63(1978); http://dx.doi.org/10.1121/1.2016739View Description Hide Description
A “continuum” of 220 two‐formant synthetic test vowels was designed with F 1 ranging from 220 to 1080 Hz and F 2 from 685 to 3515 Hz. These stimuli were presented (using a Glace‐Holmes synthesizer) to five phonetically trained listeners in two distinct synthetic vowel contexts. Both context vowels were judged to be acceptable versions of the vowel /i/; however, one of the context vowels had formant frequencies appropriate for (adult) males' speech (F 1= 250 Hz, F 2 = 2400 Hz) and the other had value appropriate for children's (F 1 = 370 Hz, F 2 = 3515 Hz). Analysis of the listeners' categorization of the test vowels indicates a systematic upward shift for all their vowel category boundaries (in both F 1 and F 2) as the context vowel is changed from the “male's” /i/ to the “child's” /i/. Statistically significant shifts are found even for the (a‐ɔ/ boundary which is phonetically (and physically) remote from the context vowels. The categorization results are in accord with natural data, since the formant frequencies of all vowels are higher for children than for males. [Work supported by University of Connecticut Research Foundation.]
63(1978); http://dx.doi.org/10.1121/1.2016740View Description Hide Description
Normalization algorithms which seek to reduce the dispersion of phonetically similar vowels produced by vocal tracts of unequal size exclusively on the basis of formant frequency data ignore a number of perceptually relevant aspects of the acoustic signal such as fundamental frequency, formant bandwidth, and spectral rolloff. These so‐called secondary characteristics of the speech signal interact with certain nonlinear properties of the inner and middle ear to produce an output function not predictable from formant frequencies alone. The model proposed here uses F1, F2, F3. and F 0 as input to derive an output from a set of transformations based on a set of (a) acoustic parameters (e.g., formant bandwidth, spectral rolloff) and (b) nonlinear properties associated with the inner and middle ear (e.g., critical bands, mechanical sensitivity). The operational characteristics of the model suggest that differences in vocal tract size can, in large part, be offset by appropriate modification of fundamental frequency.
63(1978); http://dx.doi.org/10.1121/1.2016741View Description Hide Description
As a continuation of our studies on the psychophysical bias in the judgment of vowel pitch (Chuang and Wang. ASA 60.1 and 62.1) and duration (Wang et al. ASA 60.1), this study was aimed at quantitative investigation of perceptual bias in the loudness dimension. The experimental design and the four vowels [i,e,u, and ɑ] were the same as previous studies with [ɑ] as the reference. The perceptual loudness bias differences were obtained by equalizing the syllabic speech power of the four test vowels. The mean loudness difference, based upon data gathered from 12 listeners, indicated that in reference to [ɑ], [i] was 7.1 dB, [e] was 4.5 dB, and [u] was 3.8 dB louder, respectively. These results give quantitative support to the studies of Ladefoged (1961) and Allen (1971) which showed qualitatively that with speech power normalization, the high vowels [i] and [u] were judged louder than the lower vowel [ɑ]. Furthermore, these perceptually obtained loudness differences are found to be inversely matched to the reported intensity difference of sustained vowels (Lehiste and Peterson, 1958). The negative correlation between the production intensity and the perceptual judgment of loudness in congruent with results reported in our previous pitch and duration studies. [Work supported by NSF grant BNS 76‐00017 and partially supported by NS 13274.]