Clustering of Expressive Music Performances

Haojun Wang

ISE 575 b Computational Modeling of Expressive Performance

Spring 2006

Does music have expressive patterns? If we have several performances on the same music piece, can we find a set of clusters representing different emotions? This project investigates the patterns of expressive music performance by using data mining techniques.

The idea is to consider music performances as time series, the program is able to find clusters of loudness patterns from music segments that are correlated with expressions. Specifically, there are four steps: data pre-processing, segmentation, feature extraction, and clustering.

 

Data Sets and Pre-Processing

The MIDI data set was provided by Jie Liu. The expressive performances were generated using the ESP driving interface based on two phrase grouping options created by Elaine Chew. The two interpretations, typeset in Sibelius by Dennis Leung, are shown in Figures 1 and 2. This project uses 8 (ESP-generated) performances of the naive grouping and 9 performance of the nuanced grouping to extract the loudness information using MIDIToolBox in Matlab.

Figure 1: Naive grouping of Brahms' Hungarian Dance No. 3*
 

Figure 2: Nuanced grouping of Brahms' Hungarian Dance No. 3*
(*figures courtesy of the MuCoaCo research group)

Segmentation

Raw data set cannot be directly used in data mining. It needs to be segmented so that each segment has certain expressive information and these information can be easily extracted in the following stage. I followed the segmentation algorithm proposed by Tenney & Polansky (1980). The code is available in MIDIToolBox.

References:
Tenney, J. & Polansky, L. (1980). Temporal gestalt perception
in music. Journal of Music Theory, 24(2), 205–41.

Feature Extraction

The next step is feature extraction which acts as an indexing mechanism to support efficient retrieval and matching of music segments. Feature extraction also enables dimensionality reduction and noise reduction. Currently I used Discrete Fourier Transform (DFT) in the feature extraction step. Each loudness value of the segmented music piece is extracted as the magnitude and phase shift of frequency via DFT.

Clustering and Result Analysis

The final step is clustering. K-Mean is used to find the clusters from DFT results. Both naive and nuanced music pieces can be found as the collections of clusters. For naive music pieces, 8 clusters can be found in one pass. For nuanced music pieces, 7 clusters can be found in one pass, which shows that the clusters are fairly easy to be discovered. On the other hand, as the visualization of clustering results shown in Figure 3 and 4, the difference of feature values between clusters is not considerable. Espeically in the case of the first two feature values, where every segment has the same value. This might due to the performance limitation of our current music sets. Besides, as can be seen, the difference between naive and nuanced set is not big either. This also indicates that two grouping methods might not have perceivable difference in terms of expressive performances.

Figure 3: Clustering output of naive grouping

 

Figure 4: Clustering output of nuanced grouping

Conclusion

This project illustrates that data mining techniques can be used to find the expressive patterns in music pieces. It performed first step effort on such a topic. Several improvement can be made as following:

It will also be interesting to investigate how certain emotion can be mapped on the frequency domain. For example, use a set of blue music as the input, and find the most dense cluster.

Links:

MIDI Tool Box

DFT Code (in Java)

K-Mean Clustering Tutorial

TreeView (Clustering Visualization)

ESP (Expression Synthesis via a Driving Interface)

 


 

 

 

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees