
Suppose that M simultaneous pitches are present in a frame of audio, with fundamental frequencies F={f1,f2,
,fM}, and this frame exhibits a number of peaks P={p1,p2,
} in the frequency domain. The kth element pk in P is the frequency in Hz of a single peak in the spectrum. It is expected that each note fm will contribute peaks corresponding to its own harmonic structure. Any noise in the signal or inadequacies in the peak detection method may generate additional noise or “clutter” peaks (see Fig. 1).
Figure 1. Our fundamental assumption is that the peaks generated in this fashion are realizations from a nonhomogeneous Poisson point process8 with intensity function
(p|F). The number of detected peaks N
in an interval
of the frequency domain is a Poisson random variable
![<i>p</i>(<i>N</i><sub>[script A]</sub> = <i>n</i>) = ((<i>e</i><sup>−<i>µ</i><sub>[script A]</sub></sup><i>µ</i><sub>[script A]</sub><sup><i>n</i></sup>)/(<i>n</i>!)),](EL168_1m2.gif)
where the expected number of peaks in interval
is
![<i>µ</i><sub>[script A]</sub> = [integral]<sub>[script A]</sub><i>rho</i>(<i>p</i>|<i>F</i>)<i>d</i><i>p</i>.](EL168_1m3.gif)
All peaks generated by the pitches have been combined into one set—we have technically taken a union of the individual point processes for each pitch. The union process of a number of independent Poisson processes is also a Poisson process, with intensity function given by the sum of the individual intensity functions. Therefore, we can decompose the intensity function into individual note and clutter components, i.e.
where
C(p) is the predefined intensity function of the clutter process, and
(p|fm) is the intensity function of an individual note fm.
It should be noted at this stage that this type of model is subtly different from those usually employed in peak modeling, which assume that each harmonic of each note in the spectrum has to be uniquely associated with at most one detected peak,6,9 which can lead to a combinatorial explosion of data association terms when many notes are superimposed. Here, however, each harmonic may generate any number of detected peaks, in accordance with the intensity function
(p|fm). This will lead to substantial simplifications in computation. It also models the fact that individual harmonics may often lead to “split” peaks where several peaks are detected rather than just one. In a tracking setting a similar principle has recently been applied for simplification of the classical data association problem.10,11,12
Peak detections will usually be made over a discrete set of frequency bins. Let N(k) be the number of peaks occurring in the frequency interval (k
,(k+1)
], where
is the frequency analysis bin size (easily made variable with frequency in multiscale approaches if required). Then, under the nonhomogeneous Poisson point process assumption, the probability of N(k) peaks occurring is given by

where µ(k) is defined as the expected number of peaks occurring within the kth bin. Using Eq. (1) we have
where µC(k) or µfm(k) are defined as the integrals of the intensity functions
C(p) and
(p|fm) within bin k, respectively. We term these components the rate functions within bin k since they specify the expected number of peaks contributed by each note in bin k.
We assume that a single detection is made in a particular bin if one or more peaks are present on the continuous frequency scale. We then have the probability of a peak being detected in bin k as

and the probability of no peak being detected is

Now, suppose that if a peak is detected at bin k we set observation yk=1, and yk=0 otherwise, k=0,
,K−1. Hence for detected peak data Y=[y0,y1,
,yK−1]T we have
Having obtained a likelihood function it is now in principle possible to perform inference on the number of notes and their pitches.