^{1,a)}and William Tecumseh Fitch

^{1}

### Abstract

The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a *K*-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.

This research was funded by an ERC Advanced Grant SOMACCA to W.T.F. The authors thank Walter Hödl, Markus Boeckle, Manuela Marin, and two anonymous reviewers for useful comments and suggestions.

I. INTRODUCTION

II. METHODS

A. Sound recordings

B. Acoustic analysis

C. Parameters of classification models

1. Forward-stepwise regression model

2. K-nearest neighbors

3. Support-vector machines

4. Gaussian models and estimation of the Kullback-Leibler divergence

D. Classification accuracy and reliability

1. Evaluation of the classification accuracy

2. Reliability analysis

III. RESULTS

A. Selection of acoustic parameters

B. Comparison of the classifiers' performance

C. Reliability analysis on out-of-sample recordings

IV. DISCUSSION

V. CONCLUSIONS

### Key Topics

- Agroacoustics
- 41.0
- Acoustic modeling
- 18.0
- Acoustic analysis
- 8.0
- Probability density functions
- 8.0
- Auditory system models
- 6.0

## Figures

Typical spectrograms of anuran vocalizations from the *Bufo*, *Hyla*, *Leptodactylus*, and *Rana* genera. (a) *Bufo americanus*, (b) *Bufo bufo*, (c) *Bufo japonicas*, (d) *Hyla chrysoscelis*, (e) *Hyla japonica*, (f) *Hyla minuta*, (g) *Leptodactylus bufonius*, (h) *Leptodactylus fuscus*, (i) *Leptodactylus pentadactylus*, (j) *Rana arvalis*, (k) *Rana boylii*, (l) *Rana tagoi*. Dynamic range: 55 dB.

Typical spectrograms of anuran vocalizations from the *Bufo*, *Hyla*, *Leptodactylus*, and *Rana* genera. (a) *Bufo americanus*, (b) *Bufo bufo*, (c) *Bufo japonicas*, (d) *Hyla chrysoscelis*, (e) *Hyla japonica*, (f) *Hyla minuta*, (g) *Leptodactylus bufonius*, (h) *Leptodactylus fuscus*, (i) *Leptodactylus pentadactylus*, (j) *Rana arvalis*, (k) *Rana boylii*, (l) *Rana tagoi*. Dynamic range: 55 dB.

Mean values for the DF, coefficient of variation of the root mean square of the amplitude CVA, and SF for the 142 species of anurans included in the sample.

Mean values for the DF, coefficient of variation of the root mean square of the amplitude CVA, and SF for the 142 species of anurans included in the sample.

## Tables

Classification accuracy of the multinomial logistic regression and SVM models. [Note: TP: true positives. FP: false positives. Percentages of true positives and false positives represent the means and standard error of the mean of 100 iterations of stratified 10-fold cross validation. The baseline accuracy rate was 25% (uniform prior distribution). The multinomial logistic regression was conducted using the mean values of DF, CVA, and SF for each recording, whereas the SVM algorithm used the normalized mean values of DF, CVA, and SF.]

Classification accuracy of the multinomial logistic regression and SVM models. [Note: TP: true positives. FP: false positives. Percentages of true positives and false positives represent the means and standard error of the mean of 100 iterations of stratified 10-fold cross validation. The baseline accuracy rate was 25% (uniform prior distribution). The multinomial logistic regression was conducted using the mean values of DF, CVA, and SF for each recording, whereas the SVM algorithm used the normalized mean values of DF, CVA, and SF.]

Classification accuracy of *K*-nn, MGD, and GMM used on DF, CVA, and SF. [Note: TP: true positives. FP: false positives. Percentages of true positives and false positives represent the means and standard error of the mean of 100 iterations of stratified 10-fold cross validation. The baseline accuracy rate was 28.2% (the majority class, *Hyla*). The *K*-nn algorithm used the Euclidean distances of the normalized mean values of DF, CVA, and SF. The Gaussian mixture for the GMM model used three components.]

Classification accuracy of *K*-nn, MGD, and GMM used on DF, CVA, and SF. [Note: TP: true positives. FP: false positives. Percentages of true positives and false positives represent the means and standard error of the mean of 100 iterations of stratified 10-fold cross validation. The baseline accuracy rate was 28.2% (the majority class, *Hyla*). The *K*-nn algorithm used the Euclidean distances of the normalized mean values of DF, CVA, and SF. The Gaussian mixture for the GMM model used three components.]

Classification accuracy of MGD and GMM used on MFCCs. [Note: TP: true positives. FP: false positives. Percentages of true positives and false positives represent the means and standard error of the mean of 100 iterations of stratified 10-fold cross validation. The baseline accuracy rate was 28.2% (the majority class, *Hyla*). The Gaussian mixture for the GMM model used nine components.]

Classification accuracy of MGD and GMM used on MFCCs. [Note: TP: true positives. FP: false positives. Percentages of true positives and false positives represent the means and standard error of the mean of 100 iterations of stratified 10-fold cross validation. The baseline accuracy rate was 28.2% (the majority class, *Hyla*). The Gaussian mixture for the GMM model used nine components.]

Confusion matrix obtained with the *K*-nn algorithm. (Note: Results are for 100 iterations of stratified 10-fold cross validation.)

Confusion matrix obtained with the *K*-nn algorithm. (Note: Results are for 100 iterations of stratified 10-fold cross validation.)

Classification accuracy on 52 out-of-sample recordings. (Note. Three parameters: low-level acoustic parameters identified by forward-stepwise multinomial logistic regression. TP: true positives. Nn-ranking*:* median nearest-neighbor ranking of in-sample recordings for the species corresponding to the out-of-sample recordings, with the percentage of rankings in the ten nearest neighbors in parentheses. ***: *p* < 0.001, **: *p* < 0.01, * *p* < 0.05, n.s: not significant.)

Classification accuracy on 52 out-of-sample recordings. (Note. Three parameters: low-level acoustic parameters identified by forward-stepwise multinomial logistic regression. TP: true positives. Nn-ranking*:* median nearest-neighbor ranking of in-sample recordings for the species corresponding to the out-of-sample recordings, with the percentage of rankings in the ten nearest neighbors in parentheses. ***: *p* < 0.001, **: *p* < 0.01, * *p* < 0.05, n.s: not significant.)

Article metrics loading...

Full text loading...

Commenting has been disabled for this content