The Journal of the Acoustical Society of America, Vol. 121, No. 4, pp. EL168–EL175, April 2007
©2007 Acoustical Society of America. All rights reserved. Rightslink - Permissions for ReusePermissions for ReuseAbout Rightslink

Previous section: Maximum-likelihood transcription
Next section: Conclusion and discussion
Title Page

Results

We demonstrate the two models on polyphonic, classical piano music, with up to four notes playing simultaneously. Frames were grouped into single “chord” entities using a time-frequency based segmentation procedure,13 with some manual intervention to correct for gross errors; and peaks from these grouped frames were analyzed together.

For cases where the number of simultaneous note pitches M is unknown, we estimate M by the Akaike information criterion (AIC).14 The AIC criterion is calculated as follows:

AIC = 2<i>M</i><sup>[prime]</sup> − 2  ln  <i>p</i>(<i>Y</i>|<i>µ</i>-hat<sub><i>M</i><sup>[prime]</sup></sub>),

where <i>µ</i>-hatM[prime] is the rate function corresponding to maximum likelihood estimate of M[prime] simultaneous notes. We then choose M to be the value of M[prime] for which AIC is a minimum.

The performance metric is (NME)/N where N is the correct number of notes from the ground truth, M is the number of notes missed from the ground truth, and E is the number of error notes not present in the ground truth.

Table I presents our results on the extracts (see Fig. 4) tested. Figure 5 demonstrates a transcription of the “Moonlight” extract. Results for all methods and extracts are very promising. The nonparametric trained model is observed to perform better than the generic model, but the training method used is not practical for many music transcription applications.

Figure 4. Figure 5.


Previous section: Maximum-likelihood transcription
Next section: Conclusion and discussion
Title Page