Schematic of the auditory model used to form tone descriptors for Strategy A. AN spike generation is shown for one channel (of 15) and four sensitivity levels (of 20) and onset neurons/depressing synapses for one sensitivity level (of 20).
Example raw sound signal, AN-coded spikes and onset spikes, clustered near the start of the signal, for an isolated trombone tone at sensitivity level (15 also shown for onset spikes). The onset spikes over multiple sensitivity levels are coded into a single 15 channel time-series signal called the onset fingerprint (see Fig. 3, and Sec. ???).
Example onset fingerprint signals for brass (trombone, 64 ms duration) and bowed string (violin, 200 ms duration) classes. Signal intensity is normalised to the lowest sensitivity level used for the AN spike coding (Sec. ???).
Schematic of the structure of the echo state network (Ref. 53) used as a classifier for Strategy A. An input layer of 15 nodes (one per onset fingerprint filterbank channel) connects into a large, interconnected and untrained reservoir layer. Only connections from the reservoir layer to the output layer, which has one node per instrument class, are trained (dashed).
Flowchart showing the principal steps involved in training and testing an echo state network with onset fingerprints as input signals. The upper half shows the formation of the train/test input/output signals from the individual tones in the dataset (each tone produces an onset fingerprint ). The lower half shows the network training and testing routine.
Flowchart showing the calculations used to form the 15 element MFCC descriptor vectors required for the two different versions of Strategy B.
Plot of the mean correct classification rate against spectral radius of the reservoir layer for multiple reservoir neuron leakage values and a reservoir size of 1000 units (Strategy A). Test data are solid, train data are dashed. Data are the mean of 10 repetitions with the same network parameters. The optimal test data configuration is listed in Table III.
Normalized optimal confusion matrix for Strategy A based on 10 trials with different initial network and data randomizations. All data from the McGill dataset. Standard deviation over 10 trials shown in brackets.
Normalised optimal confusion matrix for Strategy B-1 (trials and data randomizations as for Fig. 8).
Normalized optimal confusion matrix for Strategy B-2 (trials and data randomizations as for Fig. 8).
Normalized optimal confusion matrix for Strategy A based on training with all 2085 McGill sounds from the main task outlined in Sec. II, and testing with 1000 new and unseen sounds from the University of Iowa collection. Figure shows the mean (standard deviation in brackets) of ten repetitions with different initial network randomizations.
Normalized optimal confusion matrix for Strategy B-1 based on the same data split as Fig. 11. MLP network parameters were the same as for Figs. 9 and 10. Figure shows the mean (standard deviation in brackets) of 10 repetitions with different initial network randomizations.
Plot of the mean correct classification rate against spectral radius of the reservoir layer for multiple reservoir neuron leakage values and a reservoir size of 1000 units (Strategy A). Train data exclusively from the McGill dataset (dashed), test data exclusively from the Iowa dataset (solid). Data are the mean of 10 repetitions with the same network parameters.
Summary of instrument classes used in the classification task. There were 2085 tones in total (417 per class). The mean onset duration interval as detected by the auditory model used by Strategy A (see Sec. III) is shown.
Summary of parameter values and variables used in the spiking auditory model and perceptual onset detector used for Strategy A.
Summary of echo state network parameter ranges investigated for Strategy A. The optimal configuration is based on the mean of ten repetitions (see Sec. V A). Parameter explanations in Sec. ???.
Summary of tone descriptor and classification methods used by Strategies A and B (details in Secs. III and IV).
Summary of the classification performance of all strategies and train/test data combinations. Standard deviations over 10 trial repetitions with the same network configuration, but different initial randomizations, are shown in brackets.
Article metrics loading...
Full text loading...