Cepstral coefficients and hidden Markov models reveal idiosyncratic voice characteristics in red deer (Cervus elaphus) stags
Narrow band spectrogram of a bout of common roars. The common roar typically includes three phases, A, B, and C. In phase A, the formants fall while the fundamental frequency increases. During phase B the formants are more stationary. Phase C is shorter, with rising formants and a decreasing fundamental frequency.
Narrow band spectrogram of a bout of harsh roars. Compared to common roars, harsh roars are louder, atonal, and characterized by little frequency or energy modulation.
Narrow band spectrogram of a chase bark series. Chase barks are short vocalizations that are emitted in series.
Narrow band spectrogram of a single bark. Single barks are typically longer than chase barks.
Automatic detection of vocalization and silence phases in a bout of common roars. (a) Segmentation: the “forward-backward divergence” algorithm fragments the signal into stationary segments of variable size. (b) Energy thresholding: segments are classified as silence or vocalization using the relative energy of each segment. Consecutive silence segments are merged into silence phases and consecutive vocalization segments are merged into vocalization phases.
Homomorphic analysis performed on a 512 samples window of a red deer stag common roar (sampling rate: ). Panel A represents the sound wave in the time domain; the signal is periodic with a period . Panel B represents the spectrum (fast Fourier transform) of this sample, with the fundamental frequency (F0) and its harmonic series (the first six harmonics H1–H6 are labeled). Panel C shows the cepstrum . The cepstrum is calculated by taking the inverse Fourier transform of the logarithm of the energy spectrum of the signal. The contribution of the glottal source is represented by impulses spaced by samples (corresponding to the pitch period), while the contribution of the filter is represented by the lower part of the cepstrum. Finally, panel D shows the frequency spectrum obtained by applying a Fourier transform to the first eight coefficients of the cepstrum, illustrating the smoothing effect of the deconvolution process.
(a) The model of the roar bout is a succession of silences and vocalizations . The silence model is independent of the considered individuals. (b) In contrast, each individual has its own roar model, a hidden Markov model of three states, where each state emits a vector of eight cepstral coefficients according to a Gaussian mixture probability distribution. Each state is assumed to correspond to one of the three phases that characterize the roar (see Fig. 1).
Distribution of stags’ recordings across the period of vocal activity. Each cell represents the number or recorded bouts of common roars. Day 1 is the first day when the stag is heard to vocalize. Bold figures indicate the vocalizations used in the training set of the “temporal” classification test.
Confusion matrix from the hidden Markov model validation classification computed on the cepstral coefficients from 654 roaring bouts from seven red deer stags. 93.4% of tested bouts are correctly classified.
Confusion matrix from the hidden Markov model classification computed on the cepstral coefficients from 654 roaring bouts from seven red deer stags. The model is trained with two-thirds of the available bouts randomly selected within each individual, and the remaining third are tested as additional cases. 84.9% of tested bouts are correctly classified.
Confusion matrix from the hidden Markov model classification computed on the cepstral coefficients from 654 roaring bouts from seven red deer stags. The model is trained with the bouts uttered on the first days of vocal activity , and the bouts uttered during the rest of the period of vocal activity are tested as additional cases. 58.1% of tested bouts are correctly classified.
Classification of chase barks (cb), barks (ba), and harsh roars (hr) from six stags, using Hidden Markov Models trained with the cepstral coefficients from 625 common roars from seven red deer stags. 63.4% correctly classified. Chase barks: 84.6%, ; barks: 55.5%, ; harsh roars: 60%, .
Article metrics loading...
Full text loading...