Index of content:
Volume 62, Issue S1, December 1977
- PROGRAM OF THE 94TH MEETING OF THE ACOUSTICAL SOCIETY OF AMERICA
- Session A. Psychological Acoustics I: Asymmetries and Complex Signals
- Contributed Papers
62(1977); http://dx.doi.org/10.1121/1.2016065View Description Hide Description
Data were gathered from four well‐practiced subjects in a 2AFC paradigm which was subject driven, and allowed as many repeats of the stimulus pair as the subject requested prior to making his response. The stimuli consisted of the six stop‐consonants followed by the vowel /a/. These natural speech CV's have served as stimuli in numerous dichotic speech perception studies. The results indicated that when the stimuli were separated by no more than 15 msec, the subjects could be divided into two homogeneous groups. One group was able to reliably determine a parallel pair of perceptual onset functions (one for each ear) which were of moderate slope and were separated by approximately 13 msec. The other group of subjects showed a reversal of the ear asymmetry and their perceptual onset functions had a zero slope. The results will be discussed in terms of its relevance to the temporal variables involved in dichotic CV experiments. [Supported by National Institutes of Health, USPHS Grant No. NS 11647‐03.]
62(1977); http://dx.doi.org/10.1121/1.2016066View Description Hide Description
Two‐sound sequences with interpulse intervals of were presented to subjects in order to evaluate the character of judgments involved in temporal comparisons of the successive sounds. A 1000‐Hz tone and wide‐band noise were presented in all possible two‐sound combinations. The duration of the first (referent) sound was 500 msec, and the duration of the second (comparison) sound ranged from 300 to 700 msec. The subjects' task was to state whether the second sound was shorter, of the same duration, or longer than the first sound. As the difference in the durations of the two sounds increased, accuracy of judgments improved. The effect was asymmetrical; judgments were more accurate when the second sound was shorter than the first sound and less accurate when the second sound was longer, given equivalent differences in duration. Subjective impressions were that noise bursts were longer than tone presentations of the same duration, which accords with the finding that the asymmetry in accuracy of judgments was diminished by tone‐noise combinations and enhanced by noise‐tone combinations.
62(1977); http://dx.doi.org/10.1121/1.2016067View Description Hide Description
In most two‐channel selective attention studies, each earphone has been assumed to represent a separate perceptual input channel. However, at least for the detection and recognition of relatively simple acoustic signals, separate earphones do not appear to represent separate perceptual input channels. In an effort to clarify the concept of “channel,” we ran a set of monaural and binaural experiments to investigate the effects of differences in earphone, frequency, and perceived lateralized location on the adequate definition of input channel. Observers monitored increments in intensity of two simultaneously presented frequencies (2000 and 4200 Hz). In the first of two binaural conditions, the two signals were equivalent in loudness and thus, theoretically, both occupied central locations. The second of the binaural conditions spatially separated the lateralized locus of the frequencies by reducing the intensity of each in one (but not the same) earphone so that the 2000‐ and 4200‐Hz tones appeared to be lateralized to the left and right, respectively. Analogous monaural conditions were run. Observers ran monaural and binaural conditions involving signal stimuli and two stimulus presentations with selective and divided attention tasks. Preliminary results show equivalent performance across the two binaural conditions, with the binaural conditions tending to be poorer than the monaural conditions. This suggests that channel cannot be defined in terms of earphone or location of perceived image. Frequency appears to provide a more adequate definition for channel. [Research supported in part be a grant from NINCDS.]
62(1977); http://dx.doi.org/10.1121/1.2016116View Description Hide Description
A replication of the Efron‐Yund study [R. Efron and E. W. Yund, Neuropsychologia 12, 249–256 (1974)] was accomplished and findings were in close agreement with those of the original investigators: When subjects receive dichotic stimulation by steady‐state tone bursts of equal intensity but differing frequency, pitch extraction of the resulting chord is usually dominated by the frequency going to one ear resulting in an ear advantage for pitch. A series of linearly rising tone glides was constructed with center frequencies equal to the values used for pure‐tone testing, and substituted into the design so that the glides went to the nondominant ear. For most subjects, the effect is to decrease the observed ear dominance and the degree of change is predicted by the amount of frequency change of the glide stimulus. An interpretation invoking the activation of receptor cells tuned to changes in acoustic stimulation is developed.
62(1977); http://dx.doi.org/10.1121/1.2016117View Description Hide Description
In adaptation experiments, one factor which has not been reported is the possibility of differing effects depending upon whether the dominant ear or the nondominant ear received the stimulus. The present experiment utilized a monaural measurement technique which presented a 500‐Hz pure tone to the experimental ear for 7 min of adaptation exposure at a 50 dB SPL. Ten subjects were adapted in the right ear and ten in the left ear. A 10 000‐Hz tone was used as a comparison tone. There was a significant difference in right‐ versus left‐ear adaptation (12.88 vs 4.20 dB) with a t of 3.8 and p less than 0.001. An experiment using binaurally presented stimuli with the adapting stimulus at 500 Hz and 60 dB SPL, while the comparison stimulus was also 10 000 Hz (as above) but presented to the opposite ear, yielded significant ear differences. The right and left ear yielded 5.0 and 6.0 dB, respectively, with t of 2.93 and p less than 0.05. Results of both studies would suggest that the ear used for adaptation does markedly affect the magnitude of loudness adaptation measured.
62(1977); http://dx.doi.org/10.1121/1.2016118View Description Hide Description
The relative salience of the pitch components of a two‐tone dichotic chord is invariant with respect to the relative intensity of the two tones over a wide range of interaural intensity differences [R. Efron and E. W. Yund, J. Acoust. Soc. Am. 59, 889–898 (1976)]. In a recently developed model for this phenomenon, the range of intensity independence is limited by the bone‐conducted energy from the more intense tone [E. W. Yund and R. Efron, J. Acoust. Soc. Am. 62, 607–617 (1977)]. The model thus predicts that a decrease in bone conduction must increase the range of intensity independence. This prediction has been confirmed by experiments using insertion earphones. The range of intensity independence with insertion earphones was 6–17 dB greater than the value obtained for the same subjects using TDH 39 earphones in MX41/AR cushions. [Work supported by the Veterans Administration.]
62(1977); http://dx.doi.org/10.1121/1.2016119View Description Hide Description
When a 1650‐Hz tone is presented to one ear while a 1750‐Hz tone of equal intensity is presented to the other ear, most subjects report that there is a significant difference in the relative salience of the pitch of these two tones. This has been called “ear dominance” in the perception of dichotic chords [R. Efron and E. W. Yund, J. Acoust. Soc. Am. 59, 889–898 (1976)]. The present experiments were designed to neutralize or even reverse a given subject's ear dominance by increasing the bandwidth of the tone presented to his dominant ear while continuing to present a pure tone in the nondominant ear. This increase in bandwidth was accomplished by modulating, in brief, abrupt steps, the frequency of the dominant ear tone. The duration of each of the brief steady‐state portions of the FM tone was random between 1.4 and 2.0 msec; its frequency was Gaussian‐distributed around the selected center frequency. To make the salience of the two pitch components of a dichotic chord equal, a larger bandwidth increase was required in subjects having a stronger ear dominance. Lowering the sound pressure level of the stimuli in both ears increased the amount of bandwidth increase required. These results can be interpreted in terms of a model of binauralpitch mixing. [Work supported by the Veterans Administration.]
62(1977); http://dx.doi.org/10.1121/1.2016120View Description Hide Description
Hemispherectomized subjects display a strong ear dominance for the pitch mixture of dichotic chords: The tone presented to the ear ipsilateral to the removed hemisphere is less salient in the dichotic soundimage than is the contralateral ear tone [R. Efron, M. Dennis, and J. E. Bogen, J. Acoust. Soc. Am. 60, Suppl. No. 1, 550 (1976)]. Two possible causes for similar ear dominance seen in normal subjects are an asymmetry in the transducer properties of the two ears or an asymmetry in the way two basically symmetrical inputs are combined centrally [E.W. Yund and R. Efron, J. Acoust. Soc. 62, 607–617 (1977)]. Evidence for the former derives from the finding that the dominant ear tends to have a superior frequency resolving capacity [P.L. Divenyi, R. Efron, and E.W. Yund, J. Acoust. Soc. Am. 62 624–632 (1977)]. Evidence for the latter derives from the present experiments which show that, in hemispherectomized subjects, there is no difference between the frequency dl in the two ears. The results suggest the existence of an efferent pathway from the cortex which can influence the way pitch information from the two ears is combined (weighted). [Work supported by the Veterans Administration.]
62(1977); http://dx.doi.org/10.1121/1.2016169View Description Hide Description
The ability of three age groups to identify complex tones was measured by the method of absolute judgments. Group mean ages were nominally 20, 40, and 60 yr., with ten matched subjects per group. Complex tones from ten different musical instruments were tape recorded and spliced to yield six experimental tone types per instrument with different waveforms and durations. After practice, the test tones were presented monaurally in random order in the subject's better ear at a loudness level of approximately 60 phons. Each subject made a total of 120 judgments with the correct instrumental identification being provided after each trial. Analysis of variance showed a significant difference in the mean number of correct responses for the four factors of age, instrument, tone type, and practice. For the ten instruments, the average amount of information transmitted per group in order of increasing age was 1.68, 1.49, and 1.42 bits. It is concluded that (1) the processing of complex‐tone information is adversely affected by normal aging from 20 to 60 yr and (2) maximal identification is obtained for all ages when the entire tone (onset, steady state, and offset) is presented. [Work supported by SUNY Research Foundation Grant No. 023‐7185A.]
62(1977); http://dx.doi.org/10.1121/1.2016170View Description Hide Description
A series of experiments explored the information used in perception of an auditory pattern. Subjects identified which of a set of auditory patterns was presented on each trial, when each of the pattern tones was followed by an irrelevant interference tone. The structure of the patterns, the frequency relationship between the pattern and interference tones, and the time separating each pattern tone from the following interference tone were varied. Both the nature of the disruption exerted by the interference tones, and the trend of confusions among the individual patterns, provide an index of the information which is perceived. The results suggest that only the most abstract information needed to uniquely specify a pattern will be utilized. When contour was sufficient to discriminate among patterns, neither interval nor frequency information was employed. When contour alone was insufficient, successive intervals were utilized as well. The absolute frequencies of the component tones were used only when the patterns were equated for both contour and intervals. These findings suggest that perception proceeds in terms of a higher‐order structure provided by the frequency relationships within a pattern. [Work supported by NIMH.]
62(1977); http://dx.doi.org/10.1121/1.2016171View Description Hide Description
The effect of reordering pitches comprising traditional triads into all possible tonal sequences was measured on subject perception and confusion of triad qualities. Seventeen musically trained subjects participated in the study, which was conducted on the PLATOcomputer system, using computer‐accessed sound source. Stimuli consisted of equal numbers of all four triad qualities, randomly presented in all possible arrangements of tones. Sequences used 512‐Hz tone as lowest pitch (with exception of two‐thirds of augmented triad tonal sequences, which were based at semitone's interval from 512 Hz). Stimuli were presented at 0.2 and 0.1 sec/pitch, with additional simultaneous triad presentation to control for S's variability in quality recognition. Previous studies of perception of tonal sequences [W.J. Dowling, Percept. Psychophys. 9(3B), 348–349 (1972), and J.B. Davis and J. Jennings, J. Acoust. Soc. Am. 6l, 534–541 (1977)] lend support to theories of S use of auditory imagery in recognition tasks. Current study measuredeffect of order of intervals produced by tonal sequences on S accurate recognition of triad quality, with resulting implications of S's use of pre‐existent auditory pattern imagery.
62(1977); http://dx.doi.org/10.1121/1.2016172View Description Hide Description
Eight CV's (with a) were audio/video color recorded. The consonants included p, b, k, g, s, z, f, and v. Using video‐editing techniques the audio from these consonants was dubbed such that all possible pairings of each visual consonant occurred with each audio consonant. Thus 64 stimuli were constructed that consisted of 56 conflicting visual and auditory stimuli and eight nonconflicting stimuli. Subjects were seated in front of a video monitor with earphones and asked to write their response to each stimulus item. The auditory‐alone mode confirmed unequivocally correct intelligibility of each consonant. However, when conflicting visual speech reading cues were present confusions were reliable and consistent. Confusions within the fricative class (f, v, s, and z) occurred as well as confusions within the stop class (p, b, k, and g). Interclass confusions were also prominent. Especially noteworthy was the perception of consonants not present acoustically (e.g., θ, Ø, t, 1). An extremely robust confusion occurred for (f) presented visually and (b) auditorily. With voicing present auditorily subjects responses were consistently a voiceless (f). These data will be discussed with respect to distinctive feature theory and information processing.
- Session B. Speech Communication I: Speaker Identification and Communicative Disorders
62(1977); http://dx.doi.org/10.1121/1.2016223View Description Hide Description
“Voiceprinting,” appropriately defined, is the sound‐spectro‐graphic technology aspect of speech and voice individuation. A speechsound spectrogram is at once a voicesound spectrogram, since voice is the biophysical instrument of speech. Speaker individuation, known widely today as voice identification, has been with us throughout history. Technological implications of speech/voice idiosyncrasy have been sighted, and/or explored instrumentally, for at least 100 years. Here listed is a partially annotated chronology of authors and publication dates significant for this author's 30‐yr commitment to the development of an instrumental speaker‐individuation technology: Alexander Melville Bell (Visible Speech: The Science of … Physiological Letters, etc.): 1867; Galton (“fingerprinting”): 1892 (cf. Neumueller GREW—1684); Scripture!: 1902 (653 pp.); Liddell! (The Physical Characteristics of Speech Sound): 1924–1925–1927; Paget: 1930; Scripture (“vowel track analysis”): 1933; Lacerda: 1932–1934; Steinberg!: 1934; Liddell: 1940; Gray/Kopp! (“Voice Print Identification,” 41 pp., explicit): 1944; Potter: 1945–1946; Steinberg/French!: 1946; Koenig/Dunn/Lacy: 1946; Kopp/Green!: 1946; Potter/Kopp/Green!: 1947; Joos! (Acoustic Phonetics): 1948; Truby: 1957‐1959!‐1960 (“cryprinting”); Fant: 1960; Cummins/Midlo (“Dermatoglyphics”): 1961; Truby: 1962; Kersta (“Voiceprint Identification”): 1962; Garvin/Ladefoged (“Speaker Identification … Speaker Recognition”): 1963 (variously: 1950–1970: Peterson, Stevens, Flanagan, Fant, Lehiste, Truby); Tosi et al.: 1971; Smrkovski: 1974; plus the continuing, practical, day‐by‐day, field‐laboratory investigations of Hall, Smrkovski, Chiari, Richardson, Tosi, Truby, and other Certified Voice Identification Examiners: esp., 1972–1977. And whom, after all, does the relevant “scientific community” comprise … in controversial “Voiceprint” affairs? [Supported by International Association of Voice Identification.]
62(1977); http://dx.doi.org/10.1121/1.2016224View Description Hide Description
Recordings were made of 11 members of the UCLA Phonetics Laboratory group. At the time of the experiment there were two American Black speakers in the lab group, neither of whom was recorded. Instead another Black male who was relatively unfamiliar to most of the group was recorded. The recordings were made over the lab telephone, and spectrographic analysis showed that the frequency range was 100–4000 Hz. The recordings were edited and played to ten members of the lab group, who were simply asked to identify the speakers. All the listeners except one correctly identified all the 11 members of the lab group. Three of the listeners knew the Black speaker and identified him, two said they could not recognize this speaker, and five (including two phoneticians with Ph.D.'s and two post‐M.A. phonetics students), wrongly identified him as one of the two Blacks in the lab.
62(1977); http://dx.doi.org/10.1121/1.2016225View Description Hide Description
An experiment was performed to find out how well human listeners could determine whether or not two different utterances were spoken by the same speaker. The speech was coded in three ways: high‐quality PCM (natural speech), linear prediction encoding (LPC), and ADPCM at 24 kbps. For this experiment all combinations of these three coding methods were used. A group of 16 reference speakers (customers—8 males, 8 females) along with 78 test speakers (16 reference and 62 imposters) were used in the experiment. Two test utterances were used: one for the male speakers and one for the female speakers. The 30 naive listeners who participated in the experiment were required to make same/different judgments. The results will be discussed in terms of the imposter acceptance error rates (miss rates) and customer rejection error rates (false‐alarm rates) for the different test conditions.
62(1977); http://dx.doi.org/10.1121/1.2016226View Description Hide Description
This research was designed to investigate the effects of vocal disguise upon speaker identification by listening. The experiment consisted of fixed‐sequence pair discriminations. The listeners were asked to decide whether the two sentences were uttered by the same or different speakers and to rate their degree of confidence. The speakers produced two sentence sets utilizing their normal voice and five disguises. One number of each pair in the task was always undisguised; the other member was either disguised or undisguised. Two listener groups were trained for the task: a group of 24 undergraduates and a group of six doctoral students and professors of Speech and Hearing Sciences. Both groups of listeners were able to discriminate speakers with a moderately high degree of accuracy (92% correct) when both members of the stimulus pair were undisguised. The inclusion of disguised speech samples in the stimulus pair significantly interfered with performance (59%–81% correct depending upon the particular disguise). These results are similar to previous results utilizing spectrographic speaker‐identification tasks [Reich et al., J. Acoust. Soc. Am. 60, 919–925 (1976)].
62(1977); http://dx.doi.org/10.1121/1.2016227View Description Hide Description
Spoken language contains information about the speaker, which can be transmitted by suprasegmental variables. We examined the effects of independently manipulating f 0 and speech rate on judgments of several state variables and personality traits. Male speakers answered interview questions and were randomly assigned to one of nine cells—three pitch (low/unmanipulated/high) by three rate (slow/unmanipulated/fast). Speechmaterial was analyzed using linear predictive coding. Pitch was scaled up or down by 20% of the original f 0, whereas rate was expanded or compressed by 30%. Resynthesized utterances were rated by listeners on a number of scales. High‐pitched or slow‐talking speakers seem particularly undesirable; high‐pitched speakers were judged as less truthful, weaker, and nervous and slow talkers were judged as less truthful and passive. An utterance's content, presumably mediated by the question topic, also influenced state and trait attributions.
Speaker invariant characterizations of vowels, liquids, and glides using relative formant frequencies62(1977); http://dx.doi.org/10.1121/1.2016280View Description Hide Description
Sperry Univac is developing a system for automatically recognizing words in conversational speech. The linguistically oriented procedure identifies sounds by their acoustic‐phonetic correlates, forms an hypothesized sequence of sound segments for each phrase, and then matches lexical entries where similar sound segments occur. Problems to be overcome include the style of speech (exhibiting extreme reduction, coarticulation, and dynamic range), the restricted frequency range of the speech signal (telephone bandwidth), and the variety of speakers (including widely divergent males and females). To accomplish speaker independence in the acoustic‐phonetic identification processes, relative (as opposed to absolute) formant frequency characterizations of vowels, liquids, and glides are being employed. While a particular sound has disparate formant frequencies when produced by different speakers, the relative relationships between the formant frequencies are nearly invariant. The nature of the relative formant frequency measures and the results achieved will be discussed.
62(1977); http://dx.doi.org/10.1121/1.2016281View Description Hide Description
Fifty subjects used a 20‐item rating to characterize more than 100 samples of processed speech, representing every major class of speech codingsystem under various transmission conditions and also numerous simple types of laboratory degradation. Ratings were made with respect to 17 perceived speech qualities, intelligibility, pleasantness, and overall acceptability. Correlations between ratings of acceptability and ratings of other subjective qualities were calculated for each subject. These correlations served as indicants of the values attached to the various qualities by the individual subject. Intercorrelations among these indicants were factor analyzed to identify the elementary dimensions of interindividual variation in preference or taste. The results indicate that individuals vary with respect to at least four orthogonal dimensions of taste, one of which is the noise‐versus‐distortion dimension previously suggested by the results of McDermott [B. J. McDermott, J. Acoust. Soc. Am. 45 774–781 (1969).]
62(1977); http://dx.doi.org/10.1121/1.2016282View Description Hide Description
Linguapalatal contact characteristics and lip and mandible coordination of three fluent alaryngeal esophageal speakers were examined during the production of English consonants /p, t, k, s, b, d, g, z/ in CVC syllables (V = /æ/) embedded in a sentence “_____ is a word.” Pseudopalates with 96 electrodes were fabricated for each subject. Small light reflecting beads were placed on the lips and on cantilevers attached to mandibular and maxillary teeth to determine lip and mandible movements. Information on linguapalatal contact, lip and mandible positions, and voice spectrum were collected every 10 msec with the PAGIS instrumental system. Linguapalatal contact patterns were analyzed with reference to those of normal speakers in terms of place and duration of contact. The contact patterns were related to lip and mandible positions for the criterion CVC syllables. The results will be discussed with respect to known functions of speech physiology. [Supported by NIH grant NS‐11852‐02 and by NIH grant RR‐5349 awarded to the first author.]