1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
A neurally inspired musical instrument classification system based upon the sound onset
Rent:
Rent this article for
USD
10.1121/1.4707535
/content/asa/journal/jasa/131/6/10.1121/1.4707535
http://aip.metastore.ingenta.com/content/asa/journal/jasa/131/6/10.1121/1.4707535

Figures

Image of FIG. 1.
FIG. 1.

Schematic of the auditory model used to form tone descriptors for Strategy A. AN spike generation is shown for one channel (of 15) and four sensitivity levels (of 20) and onset neurons/depressing synapses for one sensitivity level (of 20).

Image of FIG. 2.
FIG. 2.

Example raw sound signal, AN-coded spikes and onset spikes, clustered near the start of the signal, for an isolated trombone tone at sensitivity level (15 also shown for onset spikes). The onset spikes over multiple sensitivity levels are coded into a single 15 channel time-series signal called the onset fingerprint (see Fig. 3, and Sec. ???).

Image of FIG. 3.
FIG. 3.

Example onset fingerprint signals for brass (trombone, 64 ms duration) and bowed string (violin, 200 ms duration) classes. Signal intensity is normalised to the lowest sensitivity level used for the AN spike coding (Sec. ???).

Image of FIG. 4.
FIG. 4.

Schematic of the structure of the echo state network (Ref. 53) used as a classifier for Strategy A. An input layer of 15 nodes (one per onset fingerprint filterbank channel) connects into a large, interconnected and untrained reservoir layer. Only connections from the reservoir layer to the output layer, which has one node per instrument class, are trained (dashed).

Image of FIG. 5.
FIG. 5.

Flowchart showing the principal steps involved in training and testing an echo state network with onset fingerprints as input signals. The upper half shows the formation of the train/test input/output signals from the individual tones in the dataset (each tone produces an onset fingerprint ). The lower half shows the network training and testing routine.

Image of FIG. 6.
FIG. 6.

Flowchart showing the calculations used to form the 15 element MFCC descriptor vectors required for the two different versions of Strategy B.

Image of FIG. 7.
FIG. 7.

Plot of the mean correct classification rate against spectral radius of the reservoir layer for multiple reservoir neuron leakage values and a reservoir size of 1000 units (Strategy A). Test data are solid, train data are dashed. Data are the mean of 10 repetitions with the same network parameters. The optimal test data configuration is listed in Table III.

Image of FIG. 8.
FIG. 8.

Normalized optimal confusion matrix for Strategy A based on 10 trials with different initial network and data randomizations. All data from the McGill dataset. Standard deviation over 10 trials shown in brackets.

Image of FIG. 9.
FIG. 9.

Normalised optimal confusion matrix for Strategy B-1 (trials and data randomizations as for Fig. 8).

Image of FIG. 10.
FIG. 10.

Normalized optimal confusion matrix for Strategy B-2 (trials and data randomizations as for Fig. 8).

Image of FIG. 11.
FIG. 11.

Normalized optimal confusion matrix for Strategy A based on training with all 2085 McGill sounds from the main task outlined in Sec. II, and testing with 1000 new and unseen sounds from the University of Iowa collection. Figure shows the mean (standard deviation in brackets) of ten repetitions with different initial network randomizations.

Image of FIG. 12.
FIG. 12.

Normalized optimal confusion matrix for Strategy B-1 based on the same data split as Fig. 11. MLP network parameters were the same as for Figs. 9 and 10. Figure shows the mean (standard deviation in brackets) of 10 repetitions with different initial network randomizations.

Image of FIG. 13.
FIG. 13.

Plot of the mean correct classification rate against spectral radius of the reservoir layer for multiple reservoir neuron leakage values and a reservoir size of 1000 units (Strategy A). Train data exclusively from the McGill dataset (dashed), test data exclusively from the Iowa dataset (solid). Data are the mean of 10 repetitions with the same network parameters.

Tables

Generic image for table
TABLE I.

Summary of instrument classes used in the classification task. There were 2085 tones in total (417 per class). The mean onset duration interval as detected by the auditory model used by Strategy A (see Sec. III) is shown.

Generic image for table
TABLE II.

Summary of parameter values and variables used in the spiking auditory model and perceptual onset detector used for Strategy A.

Generic image for table
TABLE III.

Summary of echo state network parameter ranges investigated for Strategy A. The optimal configuration is based on the mean of ten repetitions (see Sec. V A). Parameter explanations in Sec. ???.

Generic image for table
TABLE IV.

Summary of tone descriptor and classification methods used by Strategies A and B (details in Secs. III and IV).

Generic image for table
TABLE V.

Summary of the classification performance of all strategies and train/test data combinations. Standard deviations over 10 trial repetitions with the same network configuration, but different initial randomizations, are shown in brackets.

Loading

Article metrics loading...

/content/asa/journal/jasa/131/6/10.1121/1.4707535
2012-06-14
2014-04-16
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: A neurally inspired musical instrument classification system based upon the sound onset
http://aip.metastore.ingenta.com/content/asa/journal/jasa/131/6/10.1121/1.4707535
10.1121/1.4707535
SEARCH_EXPAND_ITEM