^{1,a)}

### Abstract

A cascade of two-pole–two-zero filter stages is a good model of the auditory periphery in two distinct ways. First, in the form of the *pole–zero filter cascade,* it acts as an auditory filter model that provides an excellent fit to data on human detection of tones in masking noise, with fewer fitting parameters than previously reported filter models such as the roex and gammachirp models. Second, when extended to the form of the *cascade of asymmetric resonators with fast-acting compression,* it serves as an efficient front-end filterbank for machine-hearing applications, including dynamic nonlinear effects such as fast wide-dynamic-range compression. In their underlying linear approximations, these filters are described by their poles and zeros, that is, by rational transfer functions, which makes them simple to implement in analog or digital domains. Other advantages in these models derive from the close connection of the filter-cascade architecture to wave propagation in the cochlea. These models also reflect the automatic-gain-control function of the auditory system and can maintain approximately constant impulse-response zero-crossing times as the level-dependent parameters change.

I. INTRODUCTION

II. AUDITORY FILTER MODELS

A. Time-varying and nonlinear auditory filters

B. Level dependence via output-level feedback

C. Nonlinear frequency scales

III. FILTER CASCADES

A. How filter cascades work

B. Filter-cascade stages with zeros

C. The PZFC/CAR-FAC architecture

D. PZFC/CAR-FAC transfer functions

E. CAR-FAC implementation

IV. FITTING FILTERS TO MASKING DATA

A. Human notched-noise masking data

B. Nonlinear filter fitting approach

C. Fitted psychoacoustic filter shapes

D. PZFC and OZGF provide good fits with few parameters

V. IMPULSE RESPONSES AND PHYSIOLOGICAL DATA

VI. CONCLUSION

### Key Topics

- Auditory system models
- 69.0
- Wave propagation
- 9.0
- Band models
- 8.0
- Psychological acoustics
- 6.0
- Wave mechanics
- 4.0

## Figures

Diagram of the motion of the poles of a PZFC or CAR-FAC stage in response to a gain-control feedback signal, and the effect on the resonator gain. The positions indicated by crosses in the *s* plane plot (left) correspond to pole damping ratios (*ζ*) of 0.1, 0.2, and 0.3, while the zero’s damping ratio remains fixes at 0.1. Corresponding transfer function gains (right) of this asymmetric resonator stage do not change at low frequencies but vary by several decibels near the pole frequency. The fact that the stage gain comes back up after the dip has little effect in the transfer function of a cascade of such stages.

Diagram of the motion of the poles of a PZFC or CAR-FAC stage in response to a gain-control feedback signal, and the effect on the resonator gain. The positions indicated by crosses in the *s* plane plot (left) correspond to pole damping ratios (*ζ*) of 0.1, 0.2, and 0.3, while the zero’s damping ratio remains fixes at 0.1. Corresponding transfer function gains (right) of this asymmetric resonator stage do not change at low frequencies but vary by several decibels near the pole frequency. The fact that the stage gain comes back up after the dip has little effect in the transfer function of a cascade of such stages.

Adaptation of the overall filterbank response at each output tap. (Top) The initial response of the filterbank before adaptation. (Bottom) The response after adaptation to a human/a/vowel of 0.6 s duration. The plots show that the adaptation affects the peak gains (the upper envelope of the filter curves shown), while the tails, behaving linearly, remain fixed.

Adaptation of the overall filterbank response at each output tap. (Top) The initial response of the filterbank before adaptation. (Bottom) The response after adaptation to a human/a/vowel of 0.6 s duration. The plots show that the adaptation affects the peak gains (the upper envelope of the filter curves shown), while the tails, behaving linearly, remain fixed.

Schematic of the CAR-FAC design. The cascaded filter stages (upper row) have variable peak gains, which are controlled by their damping ratios, set by feedback from the coupled AGC filters (lower row). The “control” signals can be fast-acting in response to an onset, but usually vary slowly. In the case of quasi-linear PZFC filter models, the control values are static but level-dependent.

Schematic of the CAR-FAC design. The cascaded filter stages (upper row) have variable peak gains, which are controlled by their damping ratios, set by feedback from the coupled AGC filters (lower row). The “control” signals can be fast-acting in response to an onset, but usually vary slowly. In the case of quasi-linear PZFC filter models, the control values are static but level-dependent.

The “asymmetric notched noise” masking paradigm, and data from human listeners, were introduced with this figure that explains the significant shifts between the filter with best SNR and the filter with CF at the probe-tone frequency (Patterson and Nimmo-Smith, 1980). In each example, the filter with best probe-tone-to-masking-noise ratio in its output (solid curve) is near the filter with highest probe-tone output power (dashed curve, filter with peak at probe-tone frequency *f* _{0}) but shifted in the direction that reduces the noise power output (generally toward a point slightly to the right of the center of the notch).

The “asymmetric notched noise” masking paradigm, and data from human listeners, were introduced with this figure that explains the significant shifts between the filter with best SNR and the filter with CF at the probe-tone frequency (Patterson and Nimmo-Smith, 1980). In each example, the filter with best probe-tone-to-masking-noise ratio in its output (solid curve) is near the filter with highest probe-tone output power (dashed curve, filter with peak at probe-tone frequency *f* _{0}) but shifted in the direction that reduces the noise power output (generally toward a point slightly to the right of the center of the notch).

Parallel (top), cascade (middle), and feedback (bottom) structures for level-dependent auditory filter models. The PrlGC and CasGC models originally used the upper and middle structures as a way to achieve a controllable gain near the tip while keeping a stable low-frequency tail. In the case of the PrlGC model, following an older parallel roex structure, the adder is actually adding power levels (Unoki *et al.*, 2006), not signals, so this model structure does not correspond to an actual filter.

Parallel (top), cascade (middle), and feedback (bottom) structures for level-dependent auditory filter models. The PrlGC and CasGC models originally used the upper and middle structures as a way to achieve a controllable gain near the tip while keeping a stable low-frequency tail. In the case of the PrlGC model, following an older parallel roex structure, the adder is actually adding power levels (Unoki *et al.*, 2006), not signals, so this model structure does not correspond to an actual filter.

Threshold-prediction rms errors for various filter models, versus number of fitted parameters, on the combined dataset. The fit numbers are for reference only; different filter models are identified by different symbols, as shown in the legend. For each model type, only the fit with lowest error at each number of parameters is shown; the errors are monotonically decreasing, since adding a free parameter never increases the error. The PZFC5 variants (+), such as fit 625, are the PZFC modified to have the zeros move with level, parallel with the poles, as opposed to the original PZFC () for which the zeros are fixed.

Threshold-prediction rms errors for various filter models, versus number of fitted parameters, on the combined dataset. The fit numbers are for reference only; different filter models are identified by different symbols, as shown in the legend. For each model type, only the fit with lowest error at each number of parameters is shown; the errors are monotonically decreasing, since adding a free parameter never increases the error. The PZFC5 variants (+), such as fit 625, are the PZFC modified to have the zeros move with level, parallel with the poles, as opposed to the original PZFC () for which the zeros are fixed.

Auditory filter gain plots for the best of each of six model types. The frequency axes are on the ERB-rate scale. In each case, the curves represent filter gain when the tone detection thresholds are 30 dB (highest curves), 50 dB, and 70 dB (lowest curves). The curve spacing is related to the input–output compression: curves close together, as at 250 Hz, correspond to a response that is only slightly compressive, while curve tips 15 dB apart represent a 4:1 compressive response. The model ERBs range from approximately the nominal ERB to more than twice that.

Auditory filter gain plots for the best of each of six model types. The frequency axes are on the ERB-rate scale. In each case, the curves represent filter gain when the tone detection thresholds are 30 dB (highest curves), 50 dB, and 70 dB (lowest curves). The curve spacing is related to the input–output compression: curves close together, as at 250 Hz, correspond to a response that is only slightly compressive, while curve tips 15 dB apart represent a 4:1 compressive response. The model ERBs range from approximately the nominal ERB to more than twice that.

The two degenerate cases of the OZGF, the APGF (left) and the DAPGF (right), provide good fits with only 4 parameters (quadratic bandwidth, and a bandwidth-level- dependence coefficent). They differ from the better-fitting OZGFs (the ones with more parameters) in the low-frequency tails, especially in the *differentiated* case (the DAPGF, which has a zero at DC).

The two degenerate cases of the OZGF, the APGF (left) and the DAPGF (right), provide good fits with only 4 parameters (quadratic bandwidth, and a bandwidth-level- dependence coefficent). They differ from the better-fitting OZGFs (the ones with more parameters) in the low-frequency tails, especially in the *differentiated* case (the DAPGF, which has a zero at DC).

The impulse responses for the 1 kHz channel of two versions of the PZFC, at three tone threshold levels. The large (off-scale) curves are for the noise level that leads to 30 dB SPL tone threshold, the medium (full-scale) curves for 50 dB, and the small curves for 70 dB. The PZFC5 variant is designed to have stable zero-crossing times; the difference is apparent in the plots.

The impulse responses for the 1 kHz channel of two versions of the PZFC, at three tone threshold levels. The large (off-scale) curves are for the noise level that leads to 30 dB SPL tone threshold, the medium (full-scale) curves for 50 dB, and the small curves for 70 dB. The PZFC5 variant is designed to have stable zero-crossing times; the difference is apparent in the plots.

## Tables

Acronyms for the different auditory filter models discussed are tabulated here for reference; they are ordered from simplest to most complex, or number of fitted parameters required, roughly.

Acronyms for the different auditory filter models discussed are tabulated here for reference; they are ordered from simplest to most complex, or number of fitted parameters required, roughly.

A PZFC model with 9 filter parameters (fit 530); the channel density is fixed at 2 and not counted. The pole damping *b* _{2} is computed from the CF-dependent *B* _{2} as modified by the output power level (in dB) times *B* _{2} ^{1}. In this version of the model, the zeros do not move with level.

A PZFC model with 9 filter parameters (fit 530); the channel density is fixed at 2 and not counted. The pole damping *b* _{2} is computed from the CF-dependent *B* _{2} as modified by the output power level (in dB) times *B* _{2} ^{1}. In this version of the model, the zeros do not move with level.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content