1887
banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
On the robustness of overall F0-only modifications to the perception of emotions in speech
Rent:
Rent this article for
USD
10.1121/1.2909562
/content/asa/journal/jasa/123/6/10.1121/1.2909562
http://aip.metastore.ingenta.com/content/asa/journal/jasa/123/6/10.1121/1.2909562

Figures

Image of FIG. 1.
FIG. 1.

(Color online) F0 contours of all 16 utterances that were recorded. , , , , spk, and sent denote happy, angry, neutral, sad, speaker, and sentence, respectively.

Image of FIG. 2.
FIG. 2.

(Color online) Stylization example for the happy utterance, speaker=1, sentence=1. Circles=original F0 contour, Squares=2 semitones stylization, Triangles=10 semitones stylization, Dots=40 semitones stylization.

Image of FIG. 3.
FIG. 3.

(Color online) The Gaussian emotional regions for each emotion, speaker, and sentence. axis=F0 mean (Hz), axis=F0 range (Hz).

Image of FIG. 4.
FIG. 4.

(Color online) Emotional regions for different speech quality requirements for angry emotion. The areas of emotional regions decrease as quality requirements increase. Displayed are the following quality conditions: (1) No restriction (same as Fig. 3), (2) , (3) , (4) , (5) . Small circles show the resynthesized utterances. axis=F0 mean (Hz), axis=F0 range (Hz).

Image of FIG. 5.
FIG. 5.

(Color online) Euclidean emotional regions estimated from the F0 contours of original utterances. axis=F0 mean (Hz), axis=F0 range (Hz).

Image of FIG. 6.
FIG. 6.

(Color online) The perceived Gaussian emotional regions and estimated Euclidean emotional regions, axis=F0 mean (Hz), axis=F0 range (Hz).

Image of FIG. 7.
FIG. 7.

(Color online) Figures (a), (b), (c), (d): The differences between the emotion recognition percentages of original and modified utterances. Happy=open circle, Angry=filled circle, Sad=open square, Neutral=filled square, Other=filled triangle. Figures (e), (f): The differences between the average speech qualities (5=excellent, 4=good, 3=fair, 2=poor, 1=bad) of original and modified utterances. Speaker 1=open circle, Speaker 2=filled square.

Image of FIG. 8.
FIG. 8.

(Color online) Relation between average quality, similarity, and percentage parameters. Note that the quality is normalized: 1=Excellent, 0.8=Good, 0.6=Fair. 0.4=Poor. 0.2=Bad. Squares are used for similarity, circles for percentage, and for quality variables.

Tables

Generic image for table
TABLE I.

Summary of the performed F0 contour modifications. The values for mean and range are in Hz and the values for stylization are in semitone.

Generic image for table
TABLE II.

Cochran’s Q statistics calculated for emotion selection dependent variable. Significant results are in italic form.

Generic image for table
TABLE III.

Repeated measures ANOVA statistics calculated for quality dependent variable. The reported are the F values for Greenhouse–Geisser tests.

Generic image for table
TABLE IV.

Repeated ANOVA statistics calculated for quality dependent variable. The reported are the F values for Greenhouse–Geisser tests. Significant results are shown in italic for easy differentiation.

Loading

Article metrics loading...

/content/asa/journal/jasa/123/6/10.1121/1.2909562
2008-06-01
2014-04-19
Loading

Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: On the robustness of overall F0-only modifications to the perception of emotions in speech
http://aip.metastore.ingenta.com/content/asa/journal/jasa/123/6/10.1121/1.2909562
10.1121/1.2909562
SEARCH_EXPAND_ITEM