Grammar Induction for Musical MelodiesUniversiy of Southern California, Spring 2007 in ISE575/EE675/CSCI575/PSYCH675 by Reid Swanson (2007) |
|||||||||||
|
Musical DataData ModelThere are many possible mappings for translating this approach the musical domain. One way that seemed natural was to treat each musical object in a melody, in this case a note or pitch, as a word and each musical phrase as a sentence. In this way one could build up a structural representation of phrases and perhaps derive or find some semantic meaning across similar structures. To investigate this approach I used the 48 Bach Well Tempered Fugues as my corpus. Unfortunately this technique is limited to input strings of constrained length representing sentence. The success of the approach is also determined by the amount of training data which is another reason the piece needs to be broken up. Unfortunately there is no easy method for determining what duration of music is a good unit corresponding to roughly the equivalent unit a sentence is in English. SegmentationTo extract reasonable sequences I developed two algorithms to segment the data. My first approach is derived from a split and merge algorithm common in image processing. This technique involves two steps. First it takes an input sequence (the entire piece) and determines if the sequence is sufficiently similar. If it is then that segment is returned. If not the sequence is split into two segments and then the process is recursively applied to each segment. Once the algorithm is done splitting it then scans the sequence list from left to right testing if two neighbors are sufficiently similar. If they are then they are merged and compared again to its neighbors. I use a similarity metric based on the standard deviation of pitch values in relation to some threshold. The second algorithm I developed is a set of heuristics based primarily on confidence intervals. I start by scanning the piece from left to right building up a potential subsequence. A new note is added if its pitch value is within the confidence interval of the current built up subsequence. If it is not then the subsequence is return and the new note begins a new subsequence. Sometimes however this does not result in a segment for a long period of time so I also segment on two other conditions. If the length of the subsequence reaches a certain threshold then I test if the new note is a rest and if it is the subsequence is returned. Alternatively if it is not a rest I test if the new note's duration is withing the duration confidence interval. This approach seemed to work better in practice and is used for my evaluations. After applying this technique the 48 fugues resulted in approximately 6300 segments. NormalizationAs mentioned earlier data sparsity is a big problem with this approach. Although there are fewer pitch values or pitch-duration combinations than words of English this is still a significant obstacle for this data. Instead of finding a corollary to part-of-speech though clustering techniques I normalized each pitch value to the first pitch in the segment. For example if the first pitch was C4 and the second note was D4 the representation would be '0 2'. This seems reasonable because we are actually not so concerned with the actual pitch value as much as we are with the relative motion between them. |
||||||||||
| Previous Next |