The Journal of the Acoustical Society of America, Vol. 121, No. 4, pp. EL168–EL175, April 2007
©2007 Acoustical Society of America. All rights reserved. Rightslink - Permissions for ReusePermissions for ReuseAbout Rightslink

Previous section: Rate function estimation
Next section: Results
Title Page

Maximum-likelihood transcription

For this proof-of-principle implementation we perform a frame by frame maximum likelihood (ML) transcription of audio extracts. Prior to analysis, peaks are extracted as for the training data in the nonparametric approach in Sec. III. In addition a second differencing technique was found to detect some peaks that are otherwise obscured by nearby peaks with larger magnitude, detecting, for example, the peak at 400  Hz in Fig. 1 close to the peak at 350  Hz.

An exhaustive search of all the possible note combinations to find the ML solution is computationally infeasible for long extracts. Instead we iterate to find a local maximum. An effective approach was a greedy search algorithm that added at each iteration the note with greatest increase in likelihood. We verify this greedy search with a sampling procedure that takes a subset Q of m notes from the set K={f1,f2,...,fM} of notes found, and checks that the ML solution of (m+1) notes given Q is still a subset of K. We found that the greedy solution consistently passes this verification, and suggest this is due to the robust behavior of the Poisson likelihood function (3) over the search space, which renders each note in the mixture reasonably independent of the others. This suggestion requires further verification in more detailed studies. See Fig. 3 for an illustration.

Figure 3.


Previous section: Rate function estimation
Next section: Results
Title Page