Concept    Results    Download    Links
       
   



The goals of the Emotiongram Project were (i) to find a minimal, effective feature set for emotion detection, (ii) to allow for visualization of emotional content at various time scales and with reasonable degree of resolution on a 2D emotion space, and (iii) to begin to formulate the basis for an exploratory approach to emotion detection/visualization/tracking.

The proposed system is shown below. As input, an audio file is windowed in time at several window lengths, and emotionally relevant features are then extracted, scaled, and combined. These "signals" correspond to coordinates on an emotion space and are thus mapped to different colors corresponding to different emotions. The output represents the calculated emotional quality of the input at various time resolutions.

 

 

The emotion space chosen is Robert Thayer's model (shown below). The axes of this 2-D emotion space are stress (x-axis) and energy (y-axis), and the four corners of this space describe the following emotions: contentment, exuberance, anxiousness, and depression.

 

 

These emotions are then mapped to four colors--green, yellow, red, and blue--corresponding to Meghen Miles' color-emotion associations. Selecting RGB values appropriately over the emotion space enables a mapping of all intermediate points to colors as shown below.

 

 

Signal loudness is chosen as the most appropriate measure of emotional energy. Approximate psychoacoustic loudness is calculated after filtering with the inverse of an average ISE 226 equal loudness contour. The feature extraction procedure is as follows: window, filter, calculate rms energy, and use a moving average filter to generate smoothed curves.

For the stress dimension, mode is chosen. Each .wav file is converted to midi via “AmazingMIDI” software, and a measure of mode is obtained by correlation of the (windowed) key distribution with 24 Krumhansl-Kessler profiles using the “Midi Toolbox” (in MATLAB). As a measure of “confidence”, correlation coefficient is stored as the absolute value of the mode (<= 1). The sign of this value is stored as positive for major mode and negative for minor. The feature extraction procedure for mode is as follows: window, calculate mode, and repeat this process for other window lengths (no moving average filtering here).

The features need to be scaled in order to map them onto the emotion space. For loudness, the boundaries were determined to be -40 dBFS to -20 dBFS by experimentation. The boundaries for calculated mode are -1 (highest possible correlation with minor key) to +1 (highest possible correlation with major key).

     
The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees