
For this proof-of-principle implementation we perform a frame by frame maximum likelihood (ML) transcription of audio extracts. Prior to analysis, peaks are extracted as for the training data in the nonparametric approach in Sec. III. In addition a second differencing technique was found to detect some peaks that are otherwise obscured by nearby peaks with larger magnitude, detecting, for example, the peak at 400 Hz in Fig. 1 close to the peak at 350 Hz.
An exhaustive search of all the possible note combinations to find the ML solution is computationally infeasible for long extracts. Instead we iterate to find a local maximum. An effective approach was a greedy search algorithm that added at each iteration the note with greatest increase in likelihood. We verify this greedy search with a sampling procedure that takes a subset Q of m notes from the set K={f1,f2,
,fM} of notes found, and checks that the ML solution of (m+1) notes given Q is still a subset of K. We found that the greedy solution consistently passes this verification, and suggest this is due to the robust behavior of the Poisson likelihood function (3) over the search space, which renders each note in the mixture reasonably independent of the others. This suggestion requires further verification in more detailed studies. See Fig. 3 for an illustration.
Figure 3.