Index of content:
Volume 28, Issue 4, July 1956
28(1956); http://dx.doi.org/10.1121/1.1908385View Description Hide Description
The numerous experiments on multichannel listening allow certain broad conclusions: that the listener has a limited capacity, that therefore much of the information presented to him is discarded, and that this discarding is made easier when the signals to be ignored share certain physical characteristics which the desired signals do not have. These conclusions suggest further lines of advance. The same speech signals may interfere with each other more when they are chosen from a larger ensemble. This implies that the information content must be considered. In addition, momentary peaks of information are handled by a curious form of immediate memory in the listener.
28(1956); http://dx.doi.org/10.1121/1.1908387View Description Hide Description
In several experiments, the restrictions imposed upon the communication process by an articulation test were relaxed by increasing the number of events in the communication sequence. A communication event is the transmission of a message from the source to the receiver followed by the return of a message (correct or incorrect) from receiver to source for confirmation. Although both source and receiver serve as talker and listener, the relation between them is asymmetric. Since the source knows the original message, he must accept or reject the message sent back to him according to some criterion.
Various types of communication events arise depending upon (1) whether or not the receiver correctly hears the message, and (2) whether or not the source confirms the message returned by the receiver. The various probabilities associated with these events are investigated as a function of speech‐to‐noise ratio. The behavior of the source is examined in terms of the relation between the two conditional probabilities associated with a correct and with an incorrect confirmation.
If each message is sent repeatedly until all are confirmed, a sequence of communication events is generated. A simple mathematical model accurately describes this process.
28(1956); http://dx.doi.org/10.1121/1.1908389View Description Hide Description
Each of four men spoke messages at predetermined times to a central operator over an independent communication channel. Each message consisted of an address, station identification, and three pieces of information. The operator identified relevant messages by the address and/or station identification and transcribed the information symbolically onto a plotting board. Message density varied from the overlapping of two or three messages to gaps of 20 seconds between single messages.
The purpose of the study was to evaluate two message storage schemes. In one scheme the messages could be delayed by 9 seconds at the option of the operator (fixed‐delay), in the other every message was stored until the operator called for it (variable‐delay). The fixed‐delay and variable‐delay message storage schemes were compared to a no‐help (no‐delay) condition, where the operator had no control over the sequence of messages.
The criteria used to evaluate the storage schemes were (1) the number of message repeats requested by the operator (and the associated time delays) and (2) the number of messages incorrectly transcribed (or not transcribed). The variable‐delay storage scheme was superior for all criteria, that is, fewer repeats were requested and fewer errors were made. The fixed‐delay scheme was difficult to use and was not always superior to the no‐help case.
28(1956); http://dx.doi.org/10.1121/1.1908391View Description Hide Description
MAYDAY and S.O.S. were compared in the presence of noise and speech to determine their relative detectability values (audibility thresholds). Results of the tests indicated that S.O.S. had the lower audibility threshold and therefore the greater detectability value.
28(1956); http://dx.doi.org/10.1121/1.1908393View Description Hide Description
To test several types of voice and code signals, communication was carried on over the conventional submarine underwater telephone between two submerged submarines. The signals included normal speech, differentiated and clipped speech, quarter‐speed speech, and multiple‐tone Morse code. It was found that normal speech was received at least as reliably as any of the more elaborate signals tried.
28(1956); http://dx.doi.org/10.1121/1.1908395View Description Hide Description
Two studies were made to assess the effects of high‐level noise (1) upon efficiency in speech reception and (2) in causing temporary hearing losses. Over 1400 observers made responses under various sound‐pressure levels for each of 13 different noise spectra. It was found that: (1) Increases in noise level always adversely affected speech reception, while changes in spectra did not always have an effect; time‐in‐noise did not progressively decrease scores. (2) Hearing threshold shifts of 4 to 5 db were found after a 2‐hr noise exposure; certain spectra caused greater threshold shifts than others.
28(1956); http://dx.doi.org/10.1121/1.1908397View Description Hide Description
A comparison of the intelligibility of UHF and VHF communications was made between “plane to tower” and “tower to plane” at fifteen control towers. A specially equipped plane was flown in a circle with a 20‐mi radius for these tests. Voice Communication Laboratory multiple‐choice word lists were read by two talkers, one in the plane and the other in the tower. These messages were recorded on tape in both places.
Five subjects listened to these recordings in the laboratory, and their responses were scored for accuracy against the original word lists. The percentage of words heard correctly from tower to plane varied from 90% to 38% with an average of 73.7% and was much higher than the corresponding percentage from plane to tower of 63% to 3% with an average of 44.4%. There was very little difference between the intelligibility of VHF and UHF. In general, those towers that had high plane‐to‐tower scores also had high tower‐to‐plane scores: the correlation among towers between hearing and being heard was fairly high.
28(1956); http://dx.doi.org/10.1121/1.1908399View Description Hide Description
Since the wave form of speech is apparently continuous and the phonemic entity is discrete, it is reasonable to expect that during the developmental stages of speech compression systems the process will gradually evolve from continuous to mixed discrete‐continuous and finally to a completely discrete type. Two plans for such systems are discussed. Plan I is an attempt to use two discrete features of speech together with continuous extractions of the formants, moments, and pitch of speech. Plan II is an attempt to identify phoneme‐like elements through a scheme of successive selection.
28(1956); http://dx.doi.org/10.1121/1.1908401View Description Hide Description
Speech‐band compression aims to reduce the frequency band width without impairing the desired informational content. Physical compression systems are of two types: spectrum‐alteration systems and vocoder (voice coding) systems. Spectrum alteration is not restricted to speech, is relatively simple to perform, but achieves only a compression factor of 3 to 5. Considerably higher ratios (10 to 20) are obtainable by vocoder‐type systems. Here, by a spectrum analysis on the transmitter side, a “pitch” frequency is extracted and a system function is derived. Both are then transmitted to a receiver where speech is synthesized. At the A. F. Cambridge Research Center, the following vocoder‐type systems are being developed and investigated: the scan vocoder with envelope scanning, the pulse vocoder in which the envelope is coded, and the formant vocoder in which the envelope is represented by three formant frequencies. By means of alterations and additions, the scan vocoder can be transformed into a pulse or a formant vocoder.
28(1956); http://dx.doi.org/10.1121/1.1908403View Description Hide Description
An electrical analog of the vocal tract is used to obtain experimental relations between certain idealized articulatory parameters and formant frequencies associated with the transitional and stop portions of vowel‐consonant syllables. The data are discussed in terms of the “locus hypothesis” proposed by Delattre, Liberman, and Cooper [J. Acoust. Soc. Am. 21, 769 (1955)] and in terms of simple resonator theory. It is concluded that the results modify the hypothesis that assigns one second‐formant locus to all vowel‐consonant transitions involving a given class of stop consonants. In particular, the second‐formant loci for transitions to velar and bilabial stop consonants appear to vary over a range of frequencies depending on the vowel, while loci for post‐dental stop consonants are relatively invariant. Characteristics of first‐ and third‐formant loci are discussed briefly.
28(1956); http://dx.doi.org/10.1121/1.1908405View Description Hide Description
In evaluating a speech‐processing device, the result of applying a suitable fidelity criterion to its output and the tolerance within which the reduced signals must be represented to satisfy this criterion, as well as the corresponding band width, are important. These factors establish a measure of the efficiency of the particular representation achieved by the device. This quantity can be utilized to compare it with systems having known characteristics. An important constraint is that the systems being compared have equal fidelity according to the criterion being used. The quantitative result of this comparison depends markedly upon the criterion selected. It should be chosen with the contemplated application in mind and can be either objective or subjective in nature. Since speech‐processing devices operate on the basis of an objective criterion, it appears that as future research establishes relations between subjective impressions and physical measurements, better and perhaps even the ultimate speech‐processing devices will become realities.
28(1956); http://dx.doi.org/10.1121/1.1908407View Description Hide Description
28(1956); http://dx.doi.org/10.1121/1.1908409View Description Hide Description
28(1956); http://dx.doi.org/10.1121/1.1908412View Description Hide Description
Computations of the band widths and signal‐to‐noise ratios necessary for the transmission of continuous data on the first three formants of speech are presented. The results indicate that, on the average, band widths of 7.1, 6.7, and 5.3 cps and signal‐to‐noise ratios of 33, 24, and 20 db are sufficient for the transmission of signals specifying the frequencies of the first, second, and third formants, respectively. The band‐width and signal‐to‐noise figures are computed on the basis that the error incurred in each formant signal is less than the just discriminable difference in formant frequency at least 65% of the time. The channel capacity necessary for the transmission of the three formant signals is, therefore, of the order of 200 bits per second.
28(1956); http://dx.doi.org/10.1121/1.1908414View Description Hide Description
Previous investigators have shown that speech waves can undergo any one of a number of severe forms of distortion in low ambient noise levels without serious reduction of word articulation. There are well‐known notable exceptions (e.g., center clipping). However, it is not enough to avoid these exceptional forms of distortion. In the study reported here it has been demonstrated that combinations of speech‐wave distortions, which individually are quite innocuous with regard to word articulation, can be devastating in their combined effect, even in the absence of serious noise. Four types of speech‐wave distortion were studied, individually and in combination, as follows: gross attenuation of high‐frequency components, multiple echo, random amplitude modulation, and gross irregularity of response‐frequency characteristic. Ambient noise was also a controlled environmental condition in some phases of the investigation.
28(1956); http://dx.doi.org/10.1121/1.1908416View Description Hide Description
Listeners responded to two simultaneous messages through dichotic headset circuits. Each message consisted of word groupings from the multiple‐choice intelligibility tests. The signal level of one of the messages remained constant while the level of the contralateral message was attenuated in five steps of three decibels each step. The variable under study was the effect of these attenuation levels upon the reception of the two messages. The simultaneous messages were received under conditions of noise and quiet. The findings were that the general effect of attenuating one of the simultaneous messages was that of decreasing the reception scores of the attenuated message and increasing the reception of the unattenuated message. The effect was more pronounced in noise than in quiet.
28(1956); http://dx.doi.org/10.1121/1.1908418View Description Hide Description
Previous investigations have shown that the masking capacity of a prolonged auditory stimulus remains constant over its duration in spite of its great diminution in loudness. The experiments which follow were designed to determine the relation between the masked threshold and the duration of the masked stimulus. The results show that (1) within limits, as the duration of a pure tone is increased less noise is required to mask the tone, (2) the magnitude of this effect increases as a function of frequency of the masked tone but appears to be independent of its SPL, and (3) in general, these effects diminish somewhat with practice.
28(1956); http://dx.doi.org/10.1121/1.1908420View Description Hide Description
Temporary threshold shift after auditory fatigue was measured in 178 subjects with normal hearing prior to their placement in an environment of high‐level noise. In addition, two audiograms were made: one, prior to the noise exposure period; the second, eight weeks after its termination. Results indicated that ears showing hearing losses at 3000 and 4000 cps eight weeks after the noise exposure tended to show longer recovery times on the prenoise exposure fatigue test than ears with no change. These findings are discussed in relation to the prediction of noise susceptibility.
28(1956); http://dx.doi.org/10.1121/1.1908422View Description Hide Description
An experiment is described in which detection of the absence of signal remained high and relatively constant while detection of intermixed signals varied over the threshold region as a function of signal voltage. Statistical signal detection thresholds of observers with set to detect absence of signal were approximately 2 db lower than those of observers with set to detect signal.
28(1956); http://dx.doi.org/10.1121/1.1908424View Description Hide Description
The observer's responses to aural signals in noise are compared to the output of an electronic detection system whose constants are intended to be close to those of the human auditory detection system.
Of the four variations of an electronic detector tested, the best correlation between the detector and the observer occurred for signals of duration 0. 3 second, band pass 60 cps (single tuned RLC circuit), square law detector, output filter time constant, 0.15 second.
The incomplete correlation between the signal responses of the observer and the detector can be explained by assuming that the observer's threshold fluctuates randomly about a mean value with a dispersion of about 20% of the mean, or alternatively that there is internal noise with this dispersion generated inside the observer's detection system.
The observer's false alarms appear to be caused by noise fluctuations (as measured by the electronic detector) of the same average magnitude and dispersion as those calculated for a detector with the same threshold fluctuation. However, the observer's false alarm rate is about an order of magnitude lower than that calculated for the fluctuating threshold detector so it is clear that the model is deficient in some important respects.