banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
What marks the beat of speech?
Rent this article for


Image of FIG. 1.
FIG. 1.

Possible shapes for the convolution kernel, . More possible shapes can be obtained by reversing the time (delay) axis or the vertical axis.

Image of FIG. 2.
FIG. 2.

The phase for a typical utterance plotted against time. The algorithm missed a prediction near , then behaved well for the rest of the utterance (dashed line). The solid line shows , wrapped into the range from to to simulate the behavior of Eq. (10).

Image of FIG. 3.
FIG. 3.

Values if , averaged over the corpus. The lower left measurement is the baseline, where acoustic data are shuffled with respect to ticks. Other conditions correspond to the six cases of Sec. III B, in order: running the analysis only on , and enhanced by other acoustical properties .

Image of FIG. 4.
FIG. 4.

(Color online) Convolution kernels, , that are optimal for bootstrap samples of the data. The maxima of the curves are aligned at . (These kernels maximize , and have , , and zero, corresponding to .)

Image of FIG. 5.
FIG. 5.

(Color online) Optimal bootstrap samples of the convolution kernel, , for the analysis. The maxima of the curves are aligned at . (These kernels maximize , and allow , , and to be nonzero.)

Image of FIG. 6.
FIG. 6.

Theoretical loudness contours based on this work for prominences that are (top) on adjacent syllables, (middle) separated by one, and (bottom) by two syllables. The dashed line is a loudness reference.

Image of FIG. 7.
FIG. 7.

Phase histograms (, relative to the phase of ) for three different subjects. The subjects shown have (reading from top to bottom, in the center) the largest, median, and smallest value of . The sub-figure shows values of for each subject, with error bars on the average. (The horizontal axis in the subfigure has no meaning—it just separates subjects.)

Image of FIG. 8.
FIG. 8.

Phase histogram for a typical speaker (outline) and the histogram of relative to the average phase of each utterance (filled). The peak of the dashed histogram shows the typical phase relationship between metronome ticks and the algorithm’s predictions for that subject. The width of the histograms show timing inconsistencies between the subject’s speech and the metronome.

Image of FIG. 9.
FIG. 9.

The phase relationship between the metronome ticks and peaks of (loudness, loosely speaking). Each utterance (i.e., ten repetitions of one text by one speaker) is represented by a dot at the (complex) value of , with the real part on the horizontal axis and the imaginary part on the vertical axis. The distance from the origin is thus proportional to for that utterance. Dots near the origin represent utterances that did not have a consistent phase relationship between loudness and the metronome; dots on the unit circle would have a perfectly consistent phase relationship. The angle of the point, when viewed from the origin matches the phase of , and dots just to the right of the origin come from utterances where the peak in is aligned with the metronome ticks. Dashed circles represent the average of each subject’s utterances.

Image of FIG. 10.
FIG. 10.

Sample audio data along with the spectral slope, . This shows one repetition of “We always do.”


Generic image for table

Parameters that yield the largest . The analysis operates on only . The right column shows the distribution of values that were tested in the optimization procedure (90 000 samples), and the center column shows the distribution of optimal values that were found (3400 bootstrap corpora).


Article metrics loading...


Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: What marks the beat of speech?