Index of content:
Volume 94, Issue 5, November 1993
94(1993); http://dx.doi.org/10.1121/1.408231View Description Hide Description
An analytical method for the analysis of piezoelectricceramic rectangular resonators based on the equivalent elastictheory is presented in which the coupling vibrational modes are considered. The resonance frequency equation of rectangular resonators is derived and the natural frequency spectra are calculated. Theoreticalanalysis shows that the longitudinal vibrational mode of a piezoelectricceramic slender rod, the planar radial vibrational mode, and the thickness extensional mode of a piezoelectricceramic thin plate can be obtained from the theory of this paper. Compared with numerical methods, the method presented here is very simple in computing the natural frequencies and analyzing the eigenmodes of piezoelectricceramic rectangular resonators. The resonance frequencies obtained from the theory of this paper are in good agreement with the measured results.
94(1993); http://dx.doi.org/10.1121/1.407384View Description Hide Description
A perturbation expansion model is presented for the evaluation of the second and third harmonic distortion in moving coil transducers. The model is applied to a particular underwater sound piston transducer resulting in a favorable comparison with measurements. The model should be useful for analyzingtransducer data to determine the cause of nonlinear effects. The model can also provide information regarding different nonlinear mechanisms and different driving conditions. It was found that constant current drive gives higher harmonic distortion at certain frequencies near and below resonance while constant voltage drive gives higher harmonic distortion above resonance.
94(1993); http://dx.doi.org/10.1121/1.407385View Description Hide Description
Bottlenose dolphins (tursiops truncatus) produce individually distinctive narrow‐band ‘‘signature whistles.’’ These whistles may be differentiated by the structure of their frequency contours. An algorithm is presented for extracting frequency contours from whistles and comparing two such contours. This algorithm performs nonuniform time dilation to align the contours and provides a quantitative distance measure between the contours. Two recognition experiments using the algorithm on three dolphin whistles from each of five individuals classified 15 out of 15 single‐loop whistles correctly, and 14 out of 15 central loops for multiple‐loop whistles correctly.
94(1993); http://dx.doi.org/10.1121/1.407386View Description Hide Description
This study provides a quantitative measure of the accuracy of the auditory periphery in representing prespecified time‐frequency regions of initial and final diphones of spoken CVCs. The database comprised word pairs that span the speech space along Jakobson et al.’s binary phonemic features [Tech. Rep. No. 13, Acoustic Laboratory, MIT, Cambridge, MA (1952)]. The time‐frequency domain was divided into ‘‘tiles’’ by splitting the frequency range into three bands ([0,1000], [1000,2500], [2500,4000] Hz), and by marking the phonemic time landmarks of the CVC utterance. Fourteen modified versions of this database were generated by introducing well‐defined distortions into the time‐frequency tiles (or combination of tiles). The performance of eight listeners was measured for each of these versions by using a one‐interval two‐alternative forced‐choice paradigm, to minimize the role of cognition. The results demonstrate that in the first and the second frequency bands, the diphone information is far more important than the consonant information or the vowel information alone. As for the third band, most of the information of the diphone is contained in the consonantal time interval. These observations are common to both the initial and the final consonants of spoken CVCs. The study also provides a direct mapping between Jakobson et al.’s features and particular regions in the time‐frequency domain. Voicing and nasality are strongly correlated with the diphone information in the first frequency band, graveness and compactness with the diphone information in the second frequency band, and sibilation with the consonantal time interval in the third frequency band. Sustention is equally correlated with the diphone information in the second and the third frequency bands. Since the role of cognition was neutralized to a large extent, the results may also be interpreted as a map of the phonemic distinctive features to some peripheral auditory functions that operate on the corresponding time‐frequency regions.
94(1993); http://dx.doi.org/10.1121/1.407364View Description Hide Description
This paper presents a pitch‐synchronous analysis‐by‐synthesis procedure for estimating model parameters for voiced speech. These model parameters describe the vocal‐tract shape and the time derivative of the glottal area function. The excitation waveform is derived from the glottal area function by incorporating source‐tract interaction using the current vocal‐tract input impedance. The corresponding analysis procedure for estimating the model parameters once every pitch period is outlined. A significant improvement in quality was obtained for the new pitch‐synchronous analysis/synthesis procedure relative to the fixed‐frame‐length‐based scheme used previously. It was also found that the new pitch‐synchronous articulatory analysis/synthesis scheme achieves lower rms spectral distortion values than the 2.4 kb/s. Federal standard LPC‐10E algorithm. A segment‐based procedure for estimating the vocal‐tract model parameters at a rate much lower than the current pitch is described. In this segment‐based analysis‐by‐synthesis approach, the model parameters are estimated every 50–100 ms. The parameters for the intermediate pitch periods are derived by interpolation. The segments are selected using a maximum likelihood segmentation algorithm that segments an utterance into diphonelike units. A segment‐based parameter optimization scheme could lead to a highly economical representation of the speech signal for potential applications in very low bit rate speech coding and speech storage.
The above schemes were optimized for a pilot test sentence and then evaluated using eight test sentences for a log area and the Coker articulatory model representation of the vocal tract. Nine listeners were asked to judge the quality of the synthesis in a paired‐comparison test and the results were analyzed using a nonparametric one‐tailed sign test. For the log‐area representation of the vocal tract, we found a significant degradation in speech quality for the segment‐based optimization procedure relative to the frame‐based procedure. However, for the Coker model representation, the degradation was found to be insignificant. This shows that unlike cross‐sectional areas, the movement of various articulators in the vocal tract during speech production can be described with sufficient accuracy by specifying the position of these articulators and by using an interpolation function at time intervals much longer than a pitch period.
94(1993); http://dx.doi.org/10.1121/1.407365View Description Hide Description
Simultaneous aerodynamic, acoustic, and kinematicmeasurements from the laryngeal and respiratory systems were made in order to study mechanisms for changing vocal intensity. Aerodynamic and acoustic measures include an approximation of open quotient, maximum flow declination rate, alternating glottal airflow, estimated tracheal pressure,sound pressure level, and fundamental frequency. Respiratory measures included lung volume, rib cage, and abdominal displacements. Adults were used as a comparison group to twenty 4‐year‐olds and twenty 8‐year‐olds. Laryngeal and respiratory results indicate that speech production differences between the children and adults are based both on size and function. For example, children’s absolute anteroposterior diameters of the rib cage are smaller than adults, but their rib cage movement is larger and encompasses a different range during speech breathing. Since children are functionally different than adults, age specific speech production models need to be developed.
94(1993); http://dx.doi.org/10.1121/1.407366View Description Hide Description
The purpose of this study was to investigate the feasibility of developing an acoustic metric to assess vowel production in profoundly hearing‐impaired children. The approach taken was to develop a metric from acoustic analysis of vowel productions and then compare it with the perceptual ratings of the same productions by listeners. Speech samples were collected from three profoundly hearing‐impaired children participating in a longitudinal study that investigated the effectiveness of assistive listening devices upon speech development. The metric used the extracted fundamental and first, second, and third formant frequencies to represent the tokens as points in a three‐dimensional auditory‐perceptual space modeled after earlier work by Miller [J. Acoust. Soc. Am. 85, 2114–2134 (1989)]. Euclidean distances were determined between each point and the intended vowel, which was represented by coordinates taken from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175–184 (1952)] data for children. The data suggest that the three‐dimensional metric provides significant correlations between production and perception.
94(1993); http://dx.doi.org/10.1121/1.407367View Description Hide Description
The present paper describes the results from two experiments which explored the temporal boundary between overlapping and nonoverlapping maskers and its effects on the additivity of masking. In the first experiment, detection thresholds for a short‐duration 1000‐Hz signal were measured in the presence of two equal‐duration broadband maskers which varied in degree of temporal overlap. Following complete overlap of maskers, the temporal separation of masker onsets was systematically varied to create conditions ranging from partially overlapping simultaneous masking to combined forward and backward masking. The signal was always temporally centered between the onset of the first masker and the offset of the second masker. Nonlinear additivity of masking occurred for the majority of subjects when maskers and signal did not overlap, whereas linear additivity resulted for all subjects when the maskers and signal overlapped. In the second experiment, two separate forward maskers were used so that masker/masker overlap could be manipulated independent from masker/signal overlap. Maskers were changed gradually from temporally overlapping, or concurrent, forward maskers to sequential forward maskers. Results for all subjects showed nonlinear additivity for all combined‐masker conditions. Together, these two experiments indicated that nonlinear masking additivity is observed when the signal does not overlap temporally with the maskers. However, when maskers and signal overlap temporally (and spectrally), linear additivity is observed.
94(1993); http://dx.doi.org/10.1121/1.407368View Description Hide Description
The potential contribution of level‐dependent and level‐invariant cues for the detection of a tone added to narrow bands of noise was assessed in two experiments. In the first experiment, three masker bandwidths, 40, 120, and 360 Hz, and three center frequencies, 600, 1800, and 5400 Hz were tested. For the tone‐in‐noise detection task, the signal to be detected was a tone with a frequency equal to the center frequency of the noise masker. The level of the added tone was adjusted so as to generate d’ scores of approximately 2 in a two‐alternative, forced‐choice procedure. Then, the distributions of across‐interval changes in level were measured. The distribution of differences in level was applied to either the noise‐alone or the tone‐plus‐noise stimuli, allowing the measurement of sensitivity to changes in level for a two‐interval, forced‐choice intensity discrimination task. For one of the four observers, there was a good correspondence between the d’ values obtained in the tone‐in‐noise task and the d’ values obtained in the intensity discrimination task. For the other three observers, the discriminability of changes in intensity could not account for the detection of a tone added to noise. In order to estimate sensitivity to level‐invariant cues, the noise‐alone and tone‐plus‐noise waveforms were scaled so as to present no reliable differences in level, and observers again detected which stimulus contained the added tone.
Normalization led to d’ scores smaller than those obtained in the initial tone‐in‐noise discrimination task, but performance levels did not fall to chance. For three of the four observers, the detection of a tone added to noise appeared to depend on both level and level‐invariant cues. In a second experiment, psychometric functions were obtained for the detection of a tone added to noise, for the detection of changes in level associated with the noise‐alone and the tone‐plus‐noise stimuli, and for the detection of a tone added to noise using noise‐alone and tone‐plus‐noise waveforms with no reliable differences in level. The maskers were centered at 1800 Hz, and bandwidths of 40 and 100 Hz were tested. Individual differences in detection strategy were obtained. Two observers appeared to rely on changes in level, one observer appeared to rely on level‐invariant cues, and the remaining four observers appeared to adopt a decision strategy that integrated level and level‐invariant cues. The results suggest that (a) both level‐dependent and level‐invariant cues are available to the observer for the detection of a tone added to noise, and (b) different observers employ the cues in different ways.
94(1993); http://dx.doi.org/10.1121/1.407369View Description Hide Description
Intensity discrimination of pulsed tones (also called level discrimination) was measured as a function of level in 13 listeners with sensorineural hearing impairment of primarily cochlear origin, one listener with a vestibular schwannoma, and six listeners with normal hearing.Measurements were also made in normal ears presented with masking noise spectrally shaped to produce audiograms similar to those of the cochlearly impaired listeners. For unilateral impairments, tests were made at the same frequency in the normal and impaired ears. For bilateral‐sloping impairments, tests were made at different frequencies in the same ear. The normal listeners showed results similar to other data in the literature. The listener with a vestibular schwannoma showed greatly reduced intensity resolution, except at a few levels. For listeners with recruiting sensorineural impairments, the results are discussed according to the configuration of the impairment and are compared across configurations at equal SPL, equal SL, and equal loudness level. Listeners with increasing hearing losses at frequencies above the test frequency generally showed impaired resolution, especially at high levels, and less deviation from Weber’s law than normal listeners. Listeners with decreasing hearing loss at frequencies above the test frequency showed nearly normal intensity‐resolution functions. Whereas these trends are generally present, there are also large differences among individuals. Results obtained from normal listeners who were tested in the presence of masking noise indicate that elevated thresholds and reduced dynamic range account for some, but not all, of the effects of recruiting sensorineural impairment on intensity resolution.
94(1993); http://dx.doi.org/10.1121/1.407370View Description Hide Description
Weber fractions for sound intensity were measured for 70‐, 100‐, 200‐, 1000‐, and 10 000‐Hz tone pulses at sound‐pressure levels (SPLs) ranging from just above individual listeners’ absolute thresholds to near their highest tolerable SPLs, using a two‐alternative forced‐choice adaptive staircase technique governed by a 1‐up, 3‐down rule. Results for four listeners with normal hearing and varying experience, despite individual differences in absolute values, showed Weber fractions that declined as sound pressure increased above threshold and asymptoted at intermediate SPLs. A power function with a negative exponent describes the data of the individual listeners better than a logarithmic function does. The absolute value of the exponent of the power function, which measures the curvature of the function, was largest at 70 Hz and declined with increasing frequency, similar to how exponents of power functions relating loudness judgments or simple reaction time to stimulus intensity differ with sound frequency.
94(1993); http://dx.doi.org/10.1121/1.407371View Description Hide Description
Three experiments examined the dynamic attributes of timbre by evaluating the role of onsets in similarity judgments. In separate experiments, subjects heard complete orchestral instrument tones, the onsets of those tones, and tones with the onsets removed (‘‘remainders’’). Ratings for complete tones corresponded to those for onsets, indicating that the salient acoustic attributes for complete tones are present at the onset. Ratings for complete tones also corresponded to those for remainders, indicating that the salient attributes for complete tones are present also in the absence of onsets. Subsequent acoustic analyses demonstrated that this pattern of similarity was due to the centroid frequencies and amplitude envelopes of the tones. The results indicate that the dynamic attributes of timbre are not only present at the onset, but also throughout, and that multiple acoustic attributes may contribute to the same perceptual dimensions.
94(1993); http://dx.doi.org/10.1121/1.407346View Description Hide Description
From theoretical considerations, function‐based modeling predicts the input–output characteristics of a neural system intended to perform a signal processing task within a sensory system. The sensory task under study here is the time‐ and level‐based localization of a high‐frequency, possibly amplitude‐modulated, sound source in the horizontal plane. The stimulus is assumed to be represented by each ear’s primarylike discharge pattern. An optimal system that extracts azimuthal angle from these discharge patterns, which represent acoustic time and level localization cues, has been derived. This system can be described as the maximization of a sum of three subsystems’ outputs. The stimulus cues employed by these systems are interaural level difference for the level‐based subsystem, the interaural onset‐time difference for the time‐based subsystem, and the interaural envelope‐phase difference for the phase‐based subsystem. The system encompassing all these cues is shown to trade‐off the level, time, and envelope‐phase cues depending upon the time since stimulus onset, the observation time, and the incident signal’s level. How this system might correspond to known structures in the lower auditory pathway is described.
94(1993); http://dx.doi.org/10.1121/1.407347View Description Hide Description
The ear‐canal impedance and reflection coefficient were measured in an adult group and in groups of infants of age 1, 3, 6, 12, and 24 months over frequency range 125–10 700 Hz. The development of the external ear canal and middle ear strongly affect input impedance and reflection coefficient responses, and this development is not yet complete at age 24 months. Contributing factors include growth of the area and length of the ear canal, a resonance in the ear‐canal walls of younger infants, and a probable influence of growth of the middle‐ear cavities. The middle‐ear compliance is lower in infants than adults, and the middle‐ear resistance is higher. The power transfer into the middle ear of the infant is much less than into that of the adult. Such differences in power transfer directly influence both behavioral and physiological measurements of hearing. The difficulties of interpretation of neonatal tympanograms are shown to be a consequence of ear‐canal wall vibration. Impedance and reflectance measurements in the 2–4‐kHz range are recommended as a potentially useful clinical tool for circumventing these difficulties.
A comparison of transient‐evoked and distortion product otoacoustic emissions in normal‐hearing and hearing‐impaired subjects94(1993); http://dx.doi.org/10.1121/1.407348View Description Hide Description
The ability of transient‐evoked otoacoustic emissions (TEOAEs) and distortion product otoacoustic emissions (DPOAEs) to distinguish normal hearing from hearing impairment was evaluated in 180 subjects. TEOAEs were analyzed into octave or one‐third octave bands for frequencies ranging from 500 to 4000 Hz. Decision theory was used to generate receiver operating characteristic (ROC) curves for each of three measurements(OAE amplitude, OAE/noise, reproducibility) for each OAEmeasure (octave TEOAEs, 1/3 octave TEOAEs, DPOAEs), for octave frequencies from 500 to 4000 Hz, and for seven audiometric criteria ranging from 10 to 40 dB HL. At 500 Hz, TEOAEs and DPOAEs were unable to separate normal from impaired ears. At 1000 Hz, both TEOAE measures were more accurate in identifying hearing status than DPOAEs. At 2000 Hz, all OAEmeasures performed equally well. At 4000 Hz, DPOAEs were better able to distinguish normal from impaired ears. Almost without exception, measurements of OAE/noise and reproducibility performed comparably and were superior to measurements of OAE amplitude, although the differences were small. TEOAEs analyzed into octave bands showed better performance than TEOAEs analyzed into 1/3 octaves. Under standard test conditions, OAE test performance appears to be limited by background noise, especially for the low frequencies.
94(1993); http://dx.doi.org/10.1121/1.407349View Description Hide Description
Spontaneous otoacoustic emissions (SOAEs) were measured in the ear canal of adult humans prior to, during, and following presentation of tonal and broadband stimuli to the contralateral ear. Tones were presented at a fixed level at ten frequencies relative to the SOAE. Broadband noise was presented at eight levels, from 6 to 76 dB SPL. Shifts in SOAE frequency and amplitude were observed for some subjects, for some tone conditions. Frequency shifts were always positive, whereas amplitude shifts were variable. No apparent pattern of tuning was seen, such that tones with a particular frequency relationship to the SOAEs induced greater changes in the SOAEs. Systematic changes in frequency and amplitude of SOAEs were observed for increasing level of broadband noise for all subjects. Results are discussed with respect to possible mechanism(s) responsible for the alterations in SOAEs: Transcranial conduction; the olivocochlear system; and/or the middle‐ear reflex arc.
94(1993); http://dx.doi.org/10.1121/1.407350View Description Hide Description
The fine frequency structure of the 2f 1‐f 2acoustic distortion product (ADP fine structure) was examined in ten human subjects with normal hearing. Primary frequencies (f 1 and f 2) were incremented in steps of 1/32 octave with an f 2/f 1 ratio of 1.2. The primary levels were kept equal to each other and varied from 45 to 65 dB SPL in 2.5‐dB steps. The results show that the ADP fine structure is characterized by a series of peaks and valleys across frequency, with a peak‐to‐peak frequency spacing of about 3/32 octave and a peak‐to‐valley amplitude ratio of up to 20 dB. At frequencies below 4000 Hz, as primary level increases, the sharpness of the ADP fine structure is not significantly reduced and the pattern gradually shifts to lower frequencies. At frequencies above 4000 Hz, a flattening of the pattern is sometimes observed at high levels. A consequence of the underlying process responsible for the fine structure is that ADP input/output (I/O) functions can be highly variable in shape. Dramatic shape changes can occur for ADP I/O functions obtained with primary frequency changes of as little as 1/32nd of an octave. The outward cause of I/O function variability is the behavior of the ADP fine structure with level; i.e., it remains robust at high levels and systematically shifts to lower frequencies with level. As a result, ADP peaks can shift to valleys with increasing level and vice versa. Thus, small shifts in primary frequency can result in significant changes in the shape of the ADP I/O function in humans.
94(1993); http://dx.doi.org/10.1121/1.407351View Description Hide Description
Otoacoustic emissions were measured in 42 normal hearing subjects ranging from 20 to 80 years old. For each subject spontaneous, click‐evoked, tone‐burst‐evoked, stimulus frequency and distortion product emissions were measured across a wide intensity range for frequencies between 1 and 3 kHz. Although there are significant differences between age groups, the results indicate no age effect independent of hearing sensitivity on any type or parameter of otoacoustic emissions(OAE). The effect of increasing age is confounded with the effect of decreasing sensitivity such that post hoc analyses are inadequate to separate the effects of sensitivity and age on otoacoustic emissions. Even within the range of audiometrically normal hearing,OAE characteristics vary with threshold for all age groups. The conclusion is that hearing sensitivity must be included as a controlled variable in order to accurately assess intrinsic aging effects.
94(1993); http://dx.doi.org/10.1121/1.407352View Description Hide Description
The locally optimum array detector for a random signal embedded in spherically invariant noise is synthesized. In this observation model, the signal of interest is common to each array element and the noise samples are assumed to be spatially independent across the array, but temporally correlated. Moreover, the noise at each sensor is modeled as a spherically invariant random process, which is completely characterized by the univariate probability density function, the mean, and the covariance function. The performance is assessed via computer simulations and is compared with those of two suboptimum detectors. The first one is the locally optimum detector for Gaussian correlated noise; the second one is the locally optimum detector for temporally independent non‐Gaussian noise samples.
A computational model of echo processing and acoustic imaging in frequency‐ modulated echolocating bats: The spectrogram correlation and transformation receiver94(1993); http://dx.doi.org/10.1121/1.407353View Description Hide Description
The spectrogram correlation and transformation (SCAT) model of the sonar receiver in the big brown bat (Eptesicus fuscus) consists of a cochlear component for encoding the bat’s FMsonar transmissions and multiple FM echoes in a spectrogram format, followed by two parallel pathways for processing temporal and spectral information in sonar echoes to reconstruct the absolute range and fine range structure of multiple targets from echo spectrograms. The outputs of computations taking place along these parallel pathways converge to be displayed along a computed image dimension of echo delay or target range. The resulting image depicts the location of various reflecting sources in different targets along the range axis. This series of transforms is equivalent to simultaneous, parallel forward and inverse transforms on sonar echoes, yielding the impulse responses of targets by deconvolution of the spectrograms. The performance of the model accurately reproduces the images perceived by Eptesicus in a variety of behavioral experiments on two‐glint resolution in range, echo phase sensitivity, amplitude‐latency trading of range estimates, dissociation of time‐ and frequency‐domain image components, and ranging accuracy in noise.