Index of content:
Volume 52, Issue 1A, July 1972
- PROGRAM OF THE EIGHTY‐THIRD MEETING OF THE ACOUSTICAL SOCIETY OF AMERICA
- Session A. Speech Communication I: Perception and Confusion
- Contributed Papers
52(1972); http://dx.doi.org/10.1121/1.1981743View Description Hide Description
Recent findings that the speechdominant hemisphere is specialized at the level of distinctive feature analysis entail a prediction that the processing of acoustic cues embodying a feature distinction in a particular language is asymmetric for “native” listeners, but not asymmetric for nonspeakers of the language if their own language does not employ the contrast. Russian‐ and English‐speaking listeners were presented dichotic sequences of syllables for identification of the consonants, i.e., the stops, categorized in Russian, but not in English, as palatalized or nonpalatalized. The results are discussed in the context of a recognition model equipped with a “filter” for feature selection.
52(1972); http://dx.doi.org/10.1121/1.1981744View Description Hide Description
Nine English consonants were presented in pairs for the judgments of same/different to 10 subjects. Subject's choice reaction time (CRT), in making the above judgment, was considered to represent the interpoint distance of the criterion phoneme in subjects' perceptual space. A multidimensional analysis (IND‐SCAL) showed that the magnitude of subjects' CRT for determining the sameness or differentness of a phoneme pair was governed by the distinctive features of these sounds. The features retrieved from a three‐dimensional analysis were (1) sibilant, (2) voicing, and (3) place. It was further shown that phoneme pairs distinct by zero, one, two, or three feature differences had significantly different CRTs. Pairs having a “zero” feature difference had significantly greater CRTs than pairs different by one‐three features. Similarily, pairs with a one‐feature difference had significantly greater CRTs than those with a three‐feature difference. Within one‐feature comparisons, CRT associated with sibilant was shortest following by the features place and voicing. This indicated greater distinctiveness of sibilant than place and voicing. Voicing showed the longest CRT, thus indicating minimal perceptual distinctiveness.
52(1972); http://dx.doi.org/10.1121/1.1981745View Description Hide Description
The 29 consonants of Hindi were recorded prevocalically with the vowel /ʌ/. Each consonant was included in proportion to its statistical probability in Hindi. Thus, while /p/ was included five times, /k/ 14 times. Ten native speakers of Hindi listened the stimuli in each ear separately in five S/N ratio conditions. Listeners responded in an openchoice manner. The responses were written in Devnāgri script. The tallies were made for all the errors which were then averaged for each stimulus consonant across the 10 subjects. The analysis of the 10 matrices (five S/N ratio × Ears) by IND‐SCAL method provided beat interpretation in fivedimensional space. The perceptual features obtained were best described in articulatory terms. The first two dimensions were interpreted as voicing, aspiration, and sonarant with further interpretation of sonarant as: retroflection, nasality, laterality, and semivowel. A subset of 22 consonants, for which place of articulation was phonogically distinctive, was further analyzed. The analysis yielded perceptual features (with articulatory nomenclature) front/back and palatal. [This work was partially supported by a grant from NIH.]
52(1972); http://dx.doi.org/10.1121/1.1981746View Description Hide Description
This study presents some results of an experiment conducted to determine the effect on perception of deleting different numbers of 10‐msec segments from the initial and final parts of VCV syllables. Eight stop consonants /p,t,t,k,b,d,d,g/ and two affricates /ts,dz/ were combined with three pure vowels /i,a,u/ to produce 30 VCV stimulus words. An electronic gating apparatus was developed to present sequential segments of initial and final portions of these syllables. The VC and CV stimuli so obtained were used for two separate listening tests and the responses were analyzed for individual consonants as a function of time. Perceptual phoneme boundaries were found from the response curves and compared with the acoustical phoneme boundaries obtained from sonagrams. The results indicate that the transitions of initial and final vowels terminating in the consonant are of maximal importance for the recognition of intervocalic consonants. The data indicate that the plosive burst plays a more important part in the recognition of unvoiced stops than in the case of voiced stops. The plosive gap was found to play a greater role in the recognition of voiced stops. The affricates were found to utilize all the acoustic events for their recognition. The study of errors showed that for intervocalic consonants, the voicing feature was not recognized in the early part of the initial transition and in the end part of final transition.
52(1972); http://dx.doi.org/10.1121/1.1981747View Description Hide Description
Voice onset time (VOT) has proved in many languages to be a useful descriptive variable with which stop consonants may be classified in respect of the distinction voiced/unvoiced. The question arises as to whether this variable also has fundamental perceptual relevance. Experimental evidence suggests that if VOT as such is registered in preception, then it must be registered in a fashion relative to other ongoing temporal events; for example, we have shown that perception of voicing depends on over‐all syllabic rate. However, a separate effect of the rate of transition of the first formant from the onset of voicing to the steady state in the vowel is also evident. Stevens and Klatt (1971) proposed that the presence of a significant and rapid spectral change in the F1 region at the onset of voicing was the positive cue for voicedness. It is possible to go further and attempt to separate the effects of the extent, duration, and rate of the first formant transition. In such a synthesis experiment where VOTs were held constant, we obtained differences in voiced/unvoiced responses that supported Stevens and Klatt's suggestion that VOT is not the real perceptual cue, only appearing so because of the tied variables mentioned. Our results further indicate that there is little or no effect of the extent or duration of the transition, but that a critical rate of transition for perception as optimally voiced may be operating: The implications of these findings for feature detecting mechanisms in speech perception are discussed. [Work supported by J.S.R.U.—U.K.]
52(1972); http://dx.doi.org/10.1121/1.1981748View Description Hide Description
Enough is known about the acoustical cues in speech to enable highly intelligible synthesis from simple algorithms, but, in most cases, the most valid expressions of cues from the point of view of perception and the mechanism of detection, weighting, and decision are not known. Experiments that manipulate several cues and the phonetic context offer some hope of charting information flow through the human perceptual process and providing detailed block diagram models. Several examples are given. Perception of voicing in initial stops depends upon the place of articulation as well as the traditional voicing cues such as VOT—i.e., the boundary VOT value differs with place. Two separate experiments show that this effect is carried not by acoustical place information but by a perceptual decision about place, logically prior to the: voicing decision. But, in another context effect, dependence of consonant place upon the adjacent vowel, the effect appears conditioned by the acoustical vowelinformation and not by the decision as to linguistic identity. Implications for perceptual theory and speech recognition are pointed out. [Work supported by J.S.R.U.—U.K.]
52(1972); http://dx.doi.org/10.1121/1.1981749View Description Hide Description
The suggestion has often been made that stress serves an organizing function in speech perception. The present study attempts to investigate this idea by examining the relationship between stress patterns and the frequency of occurrence of substitution errors and order errors in the perception of obstruent clusters. Eighteen disyllabic CVCCVC nonsense words were selected to serve as stimuli. Only the medial consonant cluster was varied so that all combinations of p, t, and k with s were presented under three different stress patterns: stressed first syllable, stressed second syllable, and equal stress on both syllbales. Preliminary results indicate that, although the number of correct responses is the same under all conditions of stress, the ratio of ordering errors and substitution errors is different for the three stress conditions.
52(1972); http://dx.doi.org/10.1121/1.1981750View Description Hide Description
Confusion matrices were obtained for 22 patients on four sets of nonsense syllables, using a forced‐choice procedure. Each syllable set considered of 16 consonants in combination with the vowels /i,a,u/ either in CV or VC form. Syllables were presented at a comfortable listening level approximately 40 dB above the audiologic SRT. Over‐all performance varied as a function of syllable set, vowel, and hearing loss. The confusions were analyzed by two multidimensional scaling procedures, MDSCAL and IND‐SCAL. In addition, an iterative feature analysis of transmitted information was performed, in which the feature system of Miller and Nicely and Chomsky and Halle were compared. [Research supported by grants from SRS and NINDS.]
52(1972); http://dx.doi.org/10.1121/1.1981751View Description Hide Description
Perceptual confusions were obtained at six speech‐to‐noise ratios ranging from −10 to +15 dB, at over‐all noise levels of 50, 65, 80, and 95 dB SPL. Stimuli were four sets of CV and VC nonsense syllables formed by combining all English consonants with the vowels /i,a,u/. Noise levels of 80 and 95 dB resulted in poorer discrimination, particularly at moderate speech‐to‐noise ratios. Performance also varied as a function of vowel, syllables with /u/ being consistently better discriminated than others. An iterative feature analysis of transmitted information revealed that in both the Miller and Nicely [J. Acoust. Soc. Amer. 27, 338–352 (1955)] and Chomsky and Halle [SoundPattern of English (Harper & Row, New York, 1968)] feature systems, voicing accounted for the greatest relative transmitted information. [Research supported by a grant from SRS].
52(1972); http://dx.doi.org/10.1121/1.1981752View Description Hide Description
Continued listening to recorded repetitions of a stimulus has been found to produce perceptual illusory changes in normal listeners. This phenomenon was labeled by Warren [Brit. J. Psychol. 52, 249–258 (1961)] as the verbal transformation effect. The purpose of the present investigation was to study the consistency of subjects' reported verbal transformations in terms of the number of forms and transitions elicited, number of repetitions of the stimulus prior to the subjects' first verbal transformations, types of transformations reported, exact forms employed, and order of reported transformations. Six stimuli, representing variations in meaningfulness and phonetic complexity, were individually presented via headphones to 24 subjects in each of three listening sessions. Results indicate that consistency of subjects' transformations is not equivalent for all measures investigated. Subjects were consistent in the number of forms and transitions and types of transformations that they reported, but inconsistent in the number of repetitions of the stimulus that they required before reporting their first transformations, in the exact forms which they used, and in the order of reported transformations. Consistency as a function of the meaningfulness and phonetic complexity of the stimuli, and the implications of these findings for future research on this phenomenon, are discussed.
- Session B. Nonlinear Acoustics I: General and Air Acoustics
52(1972); http://dx.doi.org/10.1121/1.1981753View Description Hide Description
As an introduction to the material that is to be covered in the three special sessions on nonlinear acoustics, the principal analytical tools used in the field will be reviewed. These include Earnshaw‐Riemann theory, weak‐shock theory, Burgers' equation, and Westervelt's method (the Green's function solution of an inhomogeneous wave equation). The review will take the form of an historical account of activity in the field since 1930. Landmark results during this period include the solutions of Fay [(1931). J. Acoust. Soc. Amer. 3, 222–241] and Fubini [(1935). Alta Frequenza 4, 530–581], the N‐wave solutions of Landau [(1942). J. Phys. Acad. Sci. USSR 6, 229–230(A); (1945). 9, 496–500], Mendousse's solution [(1953). J. Acoust. Soc. Amer. 25, 51–54] of Burgers' equation, and Westervelt's parametric array [(1960). J. Acoust. Soc. Amer. 32, 934–935(A); (1963). 35, 535–537].
52(1972); http://dx.doi.org/10.1121/1.1981754View Description Hide Description
The application of singular perturbation techniques to problems in nonlinear acoustics is demonstrated in two examples. In the first, we show that straightforward perturbation methods, using Lighthill's equation for aerodynamic sound, give a secular series when applied to one‐dimensional simple wave flow. The nonuniform secular terms are removed by the introduction of expansions in terms of multiple scaled coordinates, and the Earnshaw solution for the shock‐free region is recovered. Such an expansion technique is also applied to solve a problem in compound flow. In the second example, the method of matched asymptotic expansions (MAE) is applied to one‐dimensional flow governed by Burgers' equation. If the excitation at the origin is time harmonic, Burgers' equation can be solved exactly, and the solution reduced to an intelligible form in the shock‐free, saw‐tooth, and saturation regions. These simplified approximate solutions are recovered by a direct application of MAE to Burgers' equation, and in work still under completion, it appears that the same process will yield a uniformly valid approximate solution of the analogous problems in two and three dimensions.
52(1972); http://dx.doi.org/10.1121/1.1981755View Description Hide Description
The theory of propagation of weakly nonlinear waves through random media is discussed. The wavefield is partitioned into mean (ensemble average) and fluctuating components, and a nonlinear equation is derived governing the evolution of the mean field. This equation contains a pseudoviscous term that takes account of the irreversible transfer of energy from the mean wave to the fluctuating field. By balancing nonlinear effects against this dissipation, the existence of steady‐state shock‐like waves is deduced for the mean field, and this is illustrated by reference to the propagation of sound through an atmosphere in which the sound speed is a random function of position and through one subject to turbulent fluctuations. The difficulty encountered in applying the theory to sonic‐boom propagation is briefly discussed. [Supported by the Bristol Engine Division of Rolls Royce (1971) Ltd.]
52(1972); http://dx.doi.org/10.1121/1.1981756View Description Hide Description
The Eulerian equations of motion for wave propagation in a relaxing fluid are transformed to the Lagrangian representation by a Von Mises transformation. A single third‐order equation is obtained, which, for certain ranges of the physical parameters (much number, source frequency, dispersion number, and relaxation time), has uniform, i.e., shock‐free, perturbation series solutions. The solution to second order exhibits harmonic generation and spatial beat waves due to velocity‐dispersion‐induced phase shifts. Examination of the inverse transformation yields additional information on the shock behavior of the system. The region in which the Lagrangian and the previously determined Eulerian results [R. Klinman, L. K. Chi, and L. Kraus, J. Acoust. Soc. Amer. 49, 118 (1971)] for relaxing fluids is determined. [Work supported by Office of Naval Research.]
52(1972); http://dx.doi.org/10.1121/1.1981757View Description Hide Description
A theoretical application of Blackstock's weak‐shock solution [J. Acoust. Soc. Amer. 39, 1019–1026 (1966)] for the case when a distorted sine wave is assumed as the boundary condition is presented. In Blackstock's analysis, the waveform of an initially sinusoidal wave is expressed as a Fourier series, and the harmonic components are evaluated for ranges up to and including the formation of a decaying sawtooth wave. This paper represents a first step in applying Blackstock's solution for more general boundary conditions. A boundary condition is assumed that consists of a sine wave to which we have added the second‐harmonic component of a distortion that is supposed to have taken place prior to the point of application of the boundary condition. The solution is obtained by evaluating the Fourier components of this two‐frequency wave as it distorts. [This work was sponsored by the Office of Naval Research.]
Numerical Calculation of the Nearfield for Spherically Symmetric Nonlinear Acoustic Flows in an Unbounded Ideal Gas52(1972); http://dx.doi.org/10.1121/1.1981758View Description Hide Description
An implicit finite‐difference method has been developed for calculating spherically symmetric nonlinear acoustic flows generated by a high‐intensity periodic source. This method makes it possible to match the conditions prescribed at the source to a low‐amplitude solution (in Blackstock's sense) valid at large distances from the source. Numerical computations have been made for pistonlike and sirenlike sources of three different periods and several source amplitudes. Selected graphs of the results will be presented, together with estimates of the errors associated with the method of computation. The numerical results will also be used to show that, in some instances, a low‐amplitude solution may be matched directly to a given source even though the conditions for validity of that solution are not satisfied near the source. [Work supported by Aerospace Medical Research Laboratory, Air Force Systems Command.]
52(1972); http://dx.doi.org/10.1121/1.1981759View Description Hide Description
A phenomenological model of the acoustics of thunder has been briefly explored with the aid of a computer. A conceptually simple program yields the thunder signature in time for a given lightning signature in space. A Fourier transform then yields the frequency spectrum. A few cases of real‐life lightning have been tried and plots of thunder signatures and power spectra obtained. The model assumes a lightning stroke to be a spatial distribution of point explosive sources; each one simultaneously generates a disturbance that is idealized as evolving independently of the others. For simplicity, an N wave is used as the elemental wave that evolves by nonlinear distortion from each point. The results suggest that the ensemble average of thunder power spectra, suitably normalized (e.g., total power), resembles the spectrum of the basic N‐wave building blocks. Additionally, the model, over‐simplified as it is, appears to yield pressure‐time signatures of thunder that resemble measured signatures.
Focusing of Finite‐Amplitude Cylindrical and Spherical Sound Waves in a Viscous and Heat‐Conducting Medium52(1972); http://dx.doi.org/10.1121/1.1981760View Description Hide Description
The focusing of cylindrical and spherical pulses of finite amplitude in a medium of small viscosity and heat conductivity has been studied by dividing the region of interest into three parts: converging, interaction, and diverging regions. In the converging region, the flow field is governed by the radial Burgers equation with a small parameter multiplying the term with second derivatives. The method of matched asymptotic expansion is found applicable to this equation with an N wave as an initial condition; a composite solution obtained describes a converging N wave with increasing amplitude and wavelength. The front and rear shocks of the N wave are locally described by Taylor's shock structure. In the interaction region, no small perturbation solution exists for the shocks. However, the flow field between the front and rear shocks satisfies to first order the inviscid linear wave equation, which is solved by the Fourier transform technique. The treatment of the diverging region is similar to that of the converging region, except for trivial changes in the analysis. It is shown that the effect of entropy increase on the failure of the solution at the focus is inconsequential if certain restrictions on the initial strength and wavelength are satisfied.
52(1972); http://dx.doi.org/10.1121/1.1981761View Description Hide Description
An experimental and theoretical study of N‐wave focusing has been carried out. An N‐wave source (electric spark) was located on the principal axis of a spherical mirror so as to produce a focus (image point) beyond the center of curvature. The signals before, at, and beyond the focus were received by a wide‐band condensermicrophone of our own manufacture. Linear theory in the form of the Kirchhoff integral was used to obtain predicted waveforms. Corrections for known nonlinear propagation effects were applied. The data is explained adequately by the composite theory. Diffraction from the mirror's edge plays a prominent role. Before the focus, the reflected signal, a normal N wave, is followed in time by the diffracted signal, which is an inverted N. As the focus is approached, the amplitude becomes large and the delay between the two signals tends to zero, giving rise to a U‐shaped wave, which approximates the derivative of the original N. Thus the well‐known π/2 phase shift at the focus is explained as the superposition of the two signals. Beyond the focus, the phase shift of both signals is π; in addition, the order of arrival is reversed. Nonlinear effects cause distortion of the reflected signal, particularly after the focus, where the initial shape is an inverted N. [Work supported by ONR and AFOSR].
52(1972); http://dx.doi.org/10.1121/1.1981762View Description Hide Description
The propagation of a weak oblique shock, such as that generated by a supersonic projectile, through a two‐component diffusing gas mixture, has been investigated analytically. A linearized ray analysis has disclosed that a two‐branched caustic, with associated arête, is formed. The arête and caustic positions have been found as functions of diffusion time, initial Mach number, and molecular weight ratio.