Volume 135, Issue 6, June 2014
- jasa express letters
- general linear acoustics 
- nonlinear acoustics 
- aeroacoustics, atmospheric sound 
- underwater sound 
- ultrasonics, quantum acoustics, and physical effects of sound 
- transduction 
- structural acoustics and vibration 
- noise: its effects and control 
- acoustic signal processing 
- psychological acoustics 
- speech production 
- speech perception 
- music and musical instruments 
- animal bioacoustics 
- biomedical acoustics 
- acoustical news
- reviews of acoustical patents
Index of content:
- JASA EXPRESS LETTERS
Enhancement of speech intelligibility in reverberant rooms: Role of amplitude envelope and temporal fine structure135(2014); http://dx.doi.org/10.1121/1.4874136View Description Hide Description
The temporal envelope and fine structure of speech make distinct contributions to the perception of speech in normal-hearing listeners, and are differentially affected by room reverberation. Previous work has demonstrated enhanced speech intelligibility in reverberant rooms when prior exposure to the room was provided. Here, the relative contributions of envelope and fine structure cues to this intelligibility enhancement were tested using an open-set speech corpus and virtual auditory space techniques to independently manipulate the speech cues within a simulated room. Intelligibility enhancement was observed only when the envelope was reverberant, indicating that the enhancement is envelope-based.
135(2014); http://dx.doi.org/10.1121/1.4874223View Description Hide Description
This work investigates the direction-of-arrival problem. A time-delay-estimate (TDE) obtained from a peak of a correlation function is subject to two types of error: type I, approximation errors, and type II, errors due to spurious signals. The iterative least-squares algorithm tentatively selects spatially coherent subsets of TDEs containing no type II errors and minor contributions of type I errors (“matched-lags”). Simulations use a seven-microphone array and a gunshot signal. The evaluation methodology is rigorous, comparing empirical distribution functions of estimation error of algorithms through two-sample, one-sided Kolmogorov–Smirnov tests, and quantifying differences with Cohen's D. The direction-of-arrival estimate is improved, specifically at low signal-to-noise ratios.
Time-domain acoustic contrast control design with response differential constraint in personal audio systems135(2014); http://dx.doi.org/10.1121/1.4874236View Description Hide Description
The acoustic contrast control (ACC) approach is applied to reproduce the focused sound in personal audio systems utilizing an array of loudspeakers. A time-domain design of ACC is developed here for broadband input signals, where a response differential term is introduced to control the frequency response. Based on experimental results in an anechoic chamber, the proposed method demonstrates the potential capability to provide excellent acoustic contrast over the continuous frequency and maintain a flat frequency response. Furthermore, compared with the previous method, fewer parameters need to be tuned in the proposed method.
135(2014); http://dx.doi.org/10.1121/1.4874235View Description Hide Description
This letter presents an improvement of the image source method for geoacoustic inversion. The algorithm is based on the Teager-Kaiser energy operator which amplifies the discontinuities in signals while the soft transitions are reduced. This property is exploited for accurate detection of time arrivals and thus for location of the image sources. The effectiveness of the method is shown on both synthetic and real data and the inversion results are, overall, in good agreement with ground truth and other inversion results with a significant reduction of computation time.
135(2014); http://dx.doi.org/10.1121/1.4874355View Description Hide Description
Cracking sounds emitted by coffee beans during the roasting process were recorded and analyzed to investigate the potential of using the sounds as the basis for an automated roast monitoring technique. Three parameters were found that could be exploited. Near the end of the roasting process, sounds known as “first crack” exhibit a higher acoustic amplitude than sounds emitted later, known as “second crack.” First crack emits more low frequency energy than second crack. Finally, the rate of cracks appearing in the second crack chorus is higher than the rate in the first crack chorus.
135(2014); http://dx.doi.org/10.1121/1.4874357View Description Hide Description
This study examined whether language specific properties may lead to cross-language differences in the degree of phonetic reduction. Rates of syllabic reduction (defined here as reduction in which the number of syllables pronounced is less than expected based on canonical form) in English and Mandarin were compared. The rate of syllabic reduction was higher in Mandarin than English. Regardless of language, open syllables participated in reduction more often than closed syllables. The prevalence of open syllables was higher in Mandarin than English, and this phonotactic difference could account for Mandarin's higher rate of syllabic reduction.
135(2014); http://dx.doi.org/10.1121/1.4874224View Description Hide Description
In sonar array processing, a challenging problem is the estimation of the data covariance matrix in the presence of moving targets in the water column, since the time interval of data local stationarity is limited. This work describes an eigenvector-based method for proper data segmentation into intervals that exhibit local stationarity, providing data-driven higher bounds for the number of snapshots available for computation of time-varying sample covariance matrices. Application of the test is illustrated with simulated data in a horizontal array for the detection of a quiet source in the presence of a loud interferer.
Selection of spectral compressive operator for vector Taylor series-based model adaptation in noisy environments135(2014); http://dx.doi.org/10.1121/1.4874358View Description Hide Description
This letter investigates the impact of spectral compression on the vector Taylor series-based model adaptation algorithm. Unlike mel-frequency cepstral coefficients obtained by the logarithmic compression, the fractional power compression is used for extracting features. Since the relationship between acoustic models for clean and noisy speech depends on nonlinearity of the spectrum, it is important to select an appropriate compressive operator in the model adaptation. In this letter, the dependency of spectral nonlinearity on the speech recognition system is analyzed in various noisy environments. Experimental results confirm that the replacement of the compressive operator improves the performance of the model adaptation.
135(2014); http://dx.doi.org/10.1121/1.4874356View Description Hide Description
Velocity and pressure microphones composed of piezoelectric poly(γ-benzyl-α,L-glutamate) (PBLG) nanofibers were produced by adhering a single layer of PBLG film to a Mylar diaphragm. The device exhibited a sensitivity of −60 dBV/Pa in air, and both pressure and velocity response showed a broad frequency response that was primarily controlled by the stiffness of the supporting diaphragm. The pressure microphone response was ±3 dB between 200 Hz and 4 kHz when measured in a semi-anechoic chamber. Thermal stability, easy fabrication, and simple design make this single element transducer ideal for various applications including those for underwater and high temperature use.
Laser-induced acoustic point source for accurate impulse response measurements within the audible bandwidth135(2014); http://dx.doi.org/10.1121/1.4879664View Description Hide Description
Laser induced air breakdown is proposed as a sound source for accurate impulse response measurements. Within the audible bandwidth, the source is repeatable, broadband, and omnidirectional. The applicability of the source was evaluated by measuring the impulse response of a room. The proposed source provides a more accurate temporal and spatial representation of room reflections than conventional loudspeakers due to its omnidirectionality, negligible size and short pulse duration.
The effects of reverberant self- and overlap-masking on speech recognition in cochlear implant listeners135(2014); http://dx.doi.org/10.1121/1.4879673View Description Hide Description
Many cochlear implant (CI) listeners experience decreased speech recognition in reverberant environments [Kokkinakis et al., J. Acoust. Soc. Am. 129(5), 3221–3232 (2011)], which may be caused by a combination of self- and overlap-masking [Bolt and MacDonald, J. Acoust. Soc. Am. 21(6), 577–580 (1949)]. Determining the extent to which these effects decrease speech recognition for CI listeners may influence reverberation mitigation algorithms. This study compared speech recognition with ideal self-masking mitigation, with ideal overlap-masking mitigation, and with no mitigation. Under these conditions, mitigating either self- or overlap-masking resulted in significant improvements in speech recognition for both normal hearing subjects utilizing an acoustic model and for CI listeners using their own devices.
135(2014); http://dx.doi.org/10.1121/1.4879671View Description Hide Description
Identification of concert halls was studied to uncover whether the early or late part of the acoustic response is more salient in a hall's fingerprint. A listening test was conducted with auralizations of measured halls using full, hybrid, and truncated impulse responses convolved with anechoic symphonic music. Subjects identified halls more reliably based on differences in early responses rather than late responses, although varying the late response had more effect on acoustic parameters. The results suggest that in a typical situation with running symphonic music, the early response determines the perceptual fingerprint of a hall more than the late response.
135(2014); http://dx.doi.org/10.1121/1.4879663View Description Hide Description
This paper considers extrapolation of the vertical coherence of surface-generated oceanic ambient noise to simulate measurements made on a longer sensor array. The extrapolation method consists of projecting the noise coherence measured with a limited aperture array into the domain spanned by prolate spheroidal wave functions, which are an orthogonal basis defined by array parameters and the noise frequency. Using simulated data corresponding to selected multi-layered seabeds as ground truth, the performance of the extrapolation method is explored. Application of the technique is also demonstrated on experimental data.
135(2014); http://dx.doi.org/10.1121/1.4879672View Description Hide Description
Comparisons of finite-difference time-domain sound propagation simulations over real-life urban topography with scale-model experimental measurements are performed. A 1:100 scale model for the measurements and full-scale input geometry for the simulations are created by using digital geographic datasets. The sound pressure levels obtained by the measurements and simulations resulted in approximately 2 dB of root mean square error in the 125 and 250 Hz octave bands, and 4 dB in 500 Hz. Visualizations of a low-frequency sound propagation case by the measurement and simulation clearly show the wave phenomena caused by buildings and natural terrain.
135(2014); http://dx.doi.org/10.1121/1.4879674View Description Hide Description
Linear arrays steered to end-fire provide superdirective robust performance if a constraint is imposed on the white-noise gain. Filter-and-sum beamformers achieve the maximum constrained directivity by tuning their complex weights over the frequency. Delay-and-sum beamformers have simpler structures, but their weights are fixed and optimized at a given frequency. This letter investigates the constrained directivity provided over a broad band by different delay-and-sum techniques. Complex weights and analytic signals attain near-optimal broadband performance over four octaves. Oversteered arrays using real weights and signals were found to attain superdirective performance over approximately two octaves. Hearing aids and directional hydrophones are potential applications for the considered arrays.
135(2014); http://dx.doi.org/10.1121/1.4879668View Description Hide Description
A technique for in situ measurements of acoustic properties of a fibrous porous material is proposed in this paper. Proposed technique exploits a directivity pattern of a dipole source in its very near field. Theoretical analysis for the proposed technique is based on the Rayleigh integral with a complex reflection included. Results are compared with results of FEM analysis and show that flow resistivity of a porous material placed in the very near field of the dipole source has significant influence on the sound pressure at its ring. Results provide an excellent starting point for the design of the sensor for sound absorption.
135(2014); http://dx.doi.org/10.1121/1.4879670View Description Hide Description
Sound visualizations have been an integral part of room acoustics studies for more than a century. As acoustic measurement techniques and knowledge of hearing evolve, acousticians need more intuitive ways to represent increasingly complex data. Microphone array processing now allows accurate measurement of spatio-temporal acoustic properties. However, the multidimensional data can be a challenge to display coherently. This letter details a method of mapping visual representations of acoustic reflections from a receiver position to the surfaces from which the reflections originated. The resulting animations are presented as a spatial acoustic analysis tool.
135(2014); http://dx.doi.org/10.1121/1.4879669View Description Hide Description
Does native knowledge introduce a perceptual bias against allophones that mismatch their context? In German, [x] only occurs after back vowels, while [ç] occurs elsewhere. German and English listeners heard “allophonic” ([ç-x]) and “non-allophonic” ([ç-f], [x-f]) continua after front and back vowels. Vowel affected German responses to [ç-x] and [ç-f], but not [x-f]. Vowel affected English responses to all continua. The asymmetric effect on German responses is explained as a perceptual expectation of [ç] after [y]. The effect on English responses is explained by acoustic misparsing, which causes some of the vowel's spectrum to cue a spectrally similar fricative.
135(2014); http://dx.doi.org/10.1121/1.4879667View Description Hide Description
Periodic stimuli are common in natural environments and are ecologically relevant, for example, footsteps and vocalizations. This study reports a detectability enhancement for temporally cued, periodic sequences. Target noise bursts (embedded in background noise) arriving at the time points which followed on from an introductory, periodic “cue” sequence were more easily detected (by ∼1.5 dB SNR) than identical noise bursts which randomly deviated from the cued temporal pattern. Temporal predictability and corresponding neuronal “entrainment” have been widely theorized to underlie important processes in auditory scene analysis and to confer perceptual advantage. This is the first study in the auditory domain to clearly demonstrate a perceptual enhancement of temporally predictable, near-threshold stimuli.
135(2014); http://dx.doi.org/10.1121/1.4879666View Description Hide Description
Monolithic integration of capacitive micromachined ultrasonic transducer arrays with low noise complementary metal oxide semiconductor electronics minimizes interconnect parasitics thus allowing the measurement of thermal-mechanical (TM) noise. This enables passive ultrasonics based on cross-correlations of diffuse TM noise to extract coherent ultrasonic waves propagating between receivers. However, synchronous recording of high-frequency TM noise puts stringent requirements on the analog to digital converter's sampling rate. To alleviate this restriction, high-frequency TM noise cross-correlations (12–25 MHz) were estimated instead using compressed measurements of TM noise which could be digitized at a sampling frequency lower than the Nyquist frequency.