
In order to compute the likelihood for a given note combination in Eq. (3) it is necessary to specify a rate function µf(k) for each possible note frequency f and for each frequency bin k. We note from Eq. (2) that the rate functions are expressed in terms of the underlying Poisson intensity functions. These intensity functions can in principle be learned from annotated training data. As an alternative, we here construct the rate functions µf(k) directly, either by learning their form from training data, or by construction from generic modeling principles:
Nonparametric estimation from training data. In this approach, rate functions are estimated from a large database of annotated training data. Peaks in the discrete Fourier transform (DFT) are extracted from each frame of data using a thresholded first-difference operation, with a frequency-dependent threshold determined using a running median filter. Their positions in terms of frequency bins are then histogrammed to determine the rate functions for each note separately.
Generic model. We may expect this to generalize better to a range of instruments. In this approach a Gaussian mixture model is proposed for the rate function. The mixture components are expected to be centered close to the frequency of the harmonic number h, i.e., at a frequency hf. The general form of the rate function model for a note of fundamental frequency f is then as follows:
![<i>µ</i><sub><i>f</i></sub>(<i>k</i>) = [summation]<sub><i>h</i> = 1</sub><sup><i>H</i></sup>((<i>beta</i><sub><i>f</i>,<i>h</i></sub>)/(sqrt(2 <i>pi</i> <i>sigma</i><sub><i>f</i>,<i>h</i></sub><sup>2</sup>))) exp[−(((<i>f</i><sub><i>k</i></sub> − <i>h</i><i>f</i>)<sup>2</sup>)/(2 <i>sigma</i><sub><i>f</i>,<i>h</i></sub><sup>2</sup>))],](EL168_1m10.gif)
where fk is the center frequency of bin k, 
is the variance of that component's frequency, and
f,h are positive mixture weights (which need not sum to unity).
The variance and mixture weight components are constrained in a particular way such that
f,h=Ae−
h and 
=
2h2, where A,
, and
are parameters to be specified or fitted to the peak data. See Fig. 2 for a realization of the rate function of a note using this model.
Figure 2. An alternative approach investigated was to estimate the parameters of the model from labeled training data. However, we found in practice that the performance of such a scheme was generally poorer than the model suggested above.
The clutter intensity
C is modeled as uniform over the frequency range of the peak detector.