The Journal of the Acoustical Society of America, Vol. 121, No. 4, pp. EL168–EL175, April 2007
©2007 Acoustical Society of America. All rights reserved. Rightslink - Permissions for ReusePermissions for ReuseAbout Rightslink

Previous section: Introduction
Next section: Rate function estimation
Title Page

Poisson point processes

Suppose that M simultaneous pitches are present in a frame of audio, with fundamental frequencies F={f1,f2,...,fM}, and this frame exhibits a number of peaks P={p1,p2,...} in the frequency domain. The kth element pk in P is the frequency in Hz of a single peak in the spectrum. It is expected that each note fm will contribute peaks corresponding to its own harmonic structure. Any noise in the signal or inadequacies in the peak detection method may generate additional noise or “clutter” peaks (see Fig. 1).

Figure 1.

Our fundamental assumption is that the peaks generated in this fashion are realizations from a nonhomogeneous Poisson point process8 with intensity function rho(p|F). The number of detected peaks N<sub>[script A]</sub> in an interval [script A] of the frequency domain is a Poisson random variable

<i>p</i>(<i>N</i><sub>[script A]</sub> = <i>n</i>) = ((<i>e</i><sup>−<i>µ</i><sub>[script A]</sub></sup><i>µ</i><sub>[script A]</sub><sup><i>n</i></sup>)/(<i>n</i>!)),

where the expected number of peaks in interval [script A] is

<i>µ</i><sub>[script A]</sub> = [integral]<sub>[script A]</sub><i>rho</i>(<i>p</i>|<i>F</i>)<i>d</i><i>p</i>.

All peaks generated by the pitches have been combined into one set—we have technically taken a union of the individual point processes for each pitch. The union process of a number of independent Poisson processes is also a Poisson process, with intensity function given by the sum of the individual intensity functions. Therefore, we can decompose the intensity function into individual note and clutter components, i.e.

<i>rho</i>(<i>p</i>|<i>F</i>) = <i>rho</i>(<i>p</i>|<i>f</i><sub>1</sub>,<i>f</i><sub>2</sub>,...,<i>f</i><sub><i>M</i></sub>) = <i>rho</i><sub><i>C</i></sub>(<i>p</i>) + [summation]<sub><i>m</i> = 1</sub><sup><i>M</i></sup> <i>rho</i>(<i>p</i>|<i>f</i><sub><i>m</i></sub>),

where rhoC(p) is the predefined intensity function of the clutter process, and rho(p|fm) is the intensity function of an individual note fm.

It should be noted at this stage that this type of model is subtly different from those usually employed in peak modeling, which assume that each harmonic of each note in the spectrum has to be uniquely associated with at most one detected peak,6,9 which can lead to a combinatorial explosion of data association terms when many notes are superimposed. Here, however, each harmonic may generate any number of detected peaks, in accordance with the intensity function rho(p|fm). This will lead to substantial simplifications in computation. It also models the fact that individual harmonics may often lead to “split” peaks where several peaks are detected rather than just one. In a tracking setting a similar principle has recently been applied for simplification of the classical data association problem.10,11,12

Peak detections will usually be made over a discrete set of frequency bins. Let N(k) be the number of peaks occurring in the frequency interval (kDelta,(k+1)Delta], where Delta is the frequency analysis bin size (easily made variable with frequency in multiscale approaches if required). Then, under the nonhomogeneous Poisson point process assumption, the probability of N(k) peaks occurring is given by

<i>P</i>(<i>N</i>(<i>k</i>) = <i>n</i>|<i>f</i><sub>1</sub>,<i>f</i><sub>2</sub>,...,<i>f</i><sub><i>M</i></sub>) = ((<i>e</i><sup>−<i>µ</i>(<i>k</i>)</sup><i>µ</i>(<i>k</i>)<sup><i>n</i></sup>)/(<i>n</i>!)),

where µ(k) is defined as the expected number of peaks occurring within the kth bin. Using Eq. (1) we have

<i>µ</i>(<i>k</i>) = <i>µ</i><sub><i>C</i></sub>(<i>k</i>) + [summation]<sub><i>m</i> = 1</sub><sup><i>M</i></sup><i>µ</i><sub><i>f</i><sub><i>m</i></sub></sub>(<i>k</i>),

where µC(k) or µfm(k) are defined as the integrals of the intensity functions rhoC(p) and rho(p|fm) within bin k, respectively. We term these components the rate functions within bin k since they specify the expected number of peaks contributed by each note in bin k.

We assume that a single detection is made in a particular bin if one or more peaks are present on the continuous frequency scale. We then have the probability of a peak being detected in bin k as

<i>P</i>(<i>N</i>(<i>k</i>) >= 1) = 1 − <i>e</i><sup>−<i>µ</i>(<i>k</i>)</sup>

and the probability of no peak being detected is

<i>P</i>(<i>N</i>(<i>k</i>) = 0)<i>e</i><sup>−<i>µ</i>(<i>k</i>)</sup>.

Now, suppose that if a peak is detected at bin k we set observation yk=1, and yk=0 otherwise, k=0,...,K−1. Hence for detected peak data Y=[y0,y1,...,yK−1]T we have

<i>P</i>(<i>Y</i>|<i>F</i>) = [product]<sub><i>k</i> = 0</sub><sup><i>K</i> − 1</sup><i>y</i><sub><i>k</i></sub>(1 − <i>e</i><sup>−<i>µ</i>(<i>k</i>)</sup>) + (1 − <i>y</i><sub><i>k</i></sub>)<i>e</i><sup>−<i>µ</i>(<i>k</i>)</sup>.

Having obtained a likelihood function it is now in principle possible to perform inference on the number of notes and their pitches.


Previous section: Introduction
Next section: Rate function estimation
Title Page