1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Cepstral coefficients and hidden Markov models reveal idiosyncratic voice characteristics in red deer (Cervus elaphus) stags
Rent:
Rent this article for
USD
10.1121/1.2358006
/content/asa/journal/jasa/120/6/10.1121/1.2358006
http://aip.metastore.ingenta.com/content/asa/journal/jasa/120/6/10.1121/1.2358006

Figures

Image of FIG. 1.
FIG. 1.

Narrow band spectrogram of a bout of common roars. The common roar typically includes three phases, A, B, and C. In phase A, the formants fall while the fundamental frequency increases. During phase B the formants are more stationary. Phase C is shorter, with rising formants and a decreasing fundamental frequency.

Image of FIG. 2.
FIG. 2.

Narrow band spectrogram of a bout of harsh roars. Compared to common roars, harsh roars are louder, atonal, and characterized by little frequency or energy modulation.

Image of FIG. 3.
FIG. 3.

Narrow band spectrogram of a chase bark series. Chase barks are short vocalizations that are emitted in series.

Image of FIG. 4.
FIG. 4.

Narrow band spectrogram of a single bark. Single barks are typically longer than chase barks.

Image of FIG. 5.
FIG. 5.

Automatic detection of vocalization and silence phases in a bout of common roars. (a) Segmentation: the “forward-backward divergence” algorithm fragments the signal into stationary segments of variable size. (b) Energy thresholding: segments are classified as silence or vocalization using the relative energy of each segment. Consecutive silence segments are merged into silence phases and consecutive vocalization segments are merged into vocalization phases.

Image of FIG. 6.
FIG. 6.

Homomorphic analysis performed on a 512 samples window of a red deer stag common roar (sampling rate: ). Panel A represents the sound wave in the time domain; the signal is periodic with a period . Panel B represents the spectrum (fast Fourier transform) of this sample, with the fundamental frequency (F0) and its harmonic series (the first six harmonics H1–H6 are labeled). Panel C shows the cepstrum . The cepstrum is calculated by taking the inverse Fourier transform of the logarithm of the energy spectrum of the signal. The contribution of the glottal source is represented by impulses spaced by samples (corresponding to the pitch period), while the contribution of the filter is represented by the lower part of the cepstrum. Finally, panel D shows the frequency spectrum obtained by applying a Fourier transform to the first eight coefficients of the cepstrum, illustrating the smoothing effect of the deconvolution process.

Image of FIG. 7.
FIG. 7.

(a) The model of the roar bout is a succession of silences and vocalizations . The silence model is independent of the considered individuals. (b) In contrast, each individual has its own roar model, a hidden Markov model of three states, where each state emits a vector of eight cepstral coefficients according to a Gaussian mixture probability distribution. Each state is assumed to correspond to one of the three phases that characterize the roar (see Fig. 1).

Tables

Generic image for table
TABLE I.

Distribution of stags’ recordings across the period of vocal activity. Each cell represents the number or recorded bouts of common roars. Day 1 is the first day when the stag is heard to vocalize. Bold figures indicate the vocalizations used in the training set of the “temporal” classification test.

Generic image for table
TABLE II.

Confusion matrix from the hidden Markov model validation classification computed on the cepstral coefficients from 654 roaring bouts from seven red deer stags. 93.4% of tested bouts are correctly classified.

Generic image for table
TABLE III.

Confusion matrix from the hidden Markov model classification computed on the cepstral coefficients from 654 roaring bouts from seven red deer stags. The model is trained with two-thirds of the available bouts randomly selected within each individual, and the remaining third are tested as additional cases. 84.9% of tested bouts are correctly classified.

Generic image for table
TABLE IV.

Confusion matrix from the hidden Markov model classification computed on the cepstral coefficients from 654 roaring bouts from seven red deer stags. The model is trained with the bouts uttered on the first days of vocal activity , and the bouts uttered during the rest of the period of vocal activity are tested as additional cases. 58.1% of tested bouts are correctly classified.

Generic image for table
TABLE V.

Classification of chase barks (cb), barks (ba), and harsh roars (hr) from six stags, using Hidden Markov Models trained with the cepstral coefficients from 625 common roars from seven red deer stags. 63.4% correctly classified. Chase barks: 84.6%, ; barks: 55.5%, ; harsh roars: 60%, .

Loading

Article metrics loading...

/content/asa/journal/jasa/120/6/10.1121/1.2358006
2006-12-01
2014-04-20
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Cepstral coefficients and hidden Markov models reveal idiosyncratic voice characteristics in red deer (Cervus elaphus) stags
http://aip.metastore.ingenta.com/content/asa/journal/jasa/120/6/10.1121/1.2358006
10.1121/1.2358006
SEARCH_EXPAND_ITEM