Analysis of Dynamic Shaping in Unaccompanied Bach
By Eric Cheng
| Project Goal: The purpose of this project is to compare and contrast the dynamic shaping used by some of the greatest violinists of the 20th century: Menuhin, Milstein, and Heifietz. Analysis is conducted using the Andante movement from Bach's Sonata No. 2 for unaccompanied violin. Unaccompanied Bach is a natural choice for dynamic analysis: there is only a single instrument to be analyzed, and there are basically no dynamic markings in the score, leaving much room for interpretation by the performer. The Andante movement was chosen because the clarity of the melodic line and the regular underlying eighth-note pulse were ideal for analysis. |
||||||||||||||||||||||||||||||||||
Introduction Musical dynamics are a crucial expressive tool in music performance, conveying both emotion and musical structure. By controlling the evolution of loudness, musicians can create tension or relieve it, signal an end or a beginning, or simply express a particular emotion. In the field of music performance research, musical dynamics pose an interesting challenge, because the dB sound levels recorded in an audio file do not correspond directly with perceived loudness. This is because perceived loudness depends on more than just the amplitude of the waveform. The temporal and spectral contexts in which sounds are heard profoundly affect perceived loudness. Thus, in order to accurately examine the dynamic shaping of a given audio recording, we must process the raw waveforms to transform them from amplitude curves to loudness curves. |
||||||||||||||||||||||||||||||||||
| Loudness Curve Extraction In order to extract loudness data from raw sound files, we must take into account psychoacoustic principles such as spectral and temporal masking as well as frequency content. Spectral Masking
Temporal Masking
Procedure for Extracting Loudness Curves The basic process of extracing loudness curves from audio files can be summarized in the following steps: 1. Slice the raw waveform into frames or windows, usually around 2048 samples wide, or 46msec at a 44.1kHz sampling rate. 2. Transform each frame to the frequency domain. 3. Analyze the resulting spectrum according to the critical band resolution of the ear. 4. Take into acount temporal and spectral masking. 5. Convert the dB levels into loudness levels using steps 3 and 4. Luckily, there
are Matlab implementations which will do this for you. For this project,
I implemented Matlab code provided by P. Kabal from McGill University.
The code is based on Perceptual Evaluation of Audio Quality (PEAQ), and
is available here.
The result is a single loudness value for every frame in the waveform. |
||||||||||||||||||||||||||||||||||
| Mapping Loudness Curves to Metrical Position After carrying out the above procedure, we will end up with a loudness curve like the one shown in Figure 1. But this graph is not very useful for comparing dynamic shapings of different performers. We don't know which frames to compare since every performer played with a slightly different tempo. In order to compare dynamic shaping, we need to map the above loudness values to their metrical position in the score. Loudness at a given metrical position is independent of tempo, and can be compared across performers.
To do this,
we need to know the onset times of every beat. I wrote a program called
marker.m that allows us to accomplish this task, though somewhat crudely.
The program is essentially a stopwatch. To use it, you play the recording,
start the program in Matlab, and tap the return key along with the beat
in the recording. Every time the return key is tapped, the time is recorded
into a vector, giving us the onset times of each of the notes. Of course,
the stop watch is not started exactly when the recording is played back
(since playback must be done using another application), so a slight time
shift will be needed to fit the beat track to the piece. A program called
playback.m will take the beat track and the recording and play them back
to confirm the accuracy of the track. Once we have obtained the onset times, we can determine what frames those correspond to, and therefore find the loudness values. To account for a lack of accuracy in the beat track, and for the limited ability of a musician to control dynamic shaping over small time intervals, loudness values were averaged (smoothed) with the values within a certain time window surrounding the onset time, determined by the user. The smaller the time window, the more local variations can come through, the larger the time window, the more global the analysis. Shown below in Figure 2 is an example loudness curve, with each point separated by a metrical distance of one eighth note. Accompanying sound file is available here.
|
||||||||||||||||||||||||||||||||||
Results - Qualitative
Indeed, when we examine the loudness curves with loudness values smoothed over a window length of a bar, the global shaping of Milstein (in green below) becomes more apparent. He continues to draw arching trajectories with higher peaks and lower valleys while the curves of Heifetz and Menuhin are somewhat more flattened by the added smoothing. Though the trajectories above contrast in several ways, they do share a similar overall shape, with a dip in dynamics around bar 5 leading up to the dynamic climax of measure 7. In all cases, the performers ended the first half of the piece with nearly two measures of decrescendo, ending at the lowest point of the entire first half. Results - Quantitative Several quantitative measures were calculated for comparison. I believe these numbers should be interpreted cautiously, however, since music is meant to be listened to, and perceived by humans. If we cannot aurally confirm the trends or insights gained by quanititative analysis, then they are of questionable value. As a measure of dynamic variability, the standard deviations are shown below. The dynamic range is taken to be equal to 4*std.dev., following the convention of Repp (1998). All values are in loudness units of Sones.
These numbers were confirmed to a certain extent upon listening to the recordings. The standard deviation measure does not appear to capture the extent to which Menuhin's dynamic shaping varies more locally than that of Milstein's. This could simply be a function of the recording, since Menuhin's recording was made in 1934, some 20 years before the others. Heifetz greater dynamic variability and range were confirmed upon listening. Additionally, as a measure of similarity between shaping strategies, correlation coefficients were calculated and are shown below: Correlation Coefficients
The correlation coefficients are all positive, and nearly identical. However, it is interesting to note the relatively low correlation coefficient between Menuhin and Milstein, whereas Heifetz has a relatively high correlation coefficient with both Milstein and Menuhin. Indeed, visual inspection of the loudness curves suggests that Heifetz's shaping strategy borrows elements from both Milstein and Menuhin. While Heifetz employs greater local dynamic variation similar to that of Menuhin, he does so while employing greater global arching to his trajectories, similar to Milstein. This can be seen in Figure 4, where Heifitiz seems to occupy the middle ground between Milstein and Menuhin in terms of global arching.
In order to gain insight into the existence of any general shaping strategies common to all performers, the average mean squared error and correlation were calculated for each bar in the piece. Figure 5 shows the averaged correlation coefficients. The local maxima occuring in measures 4, 8, 11, 14, 19, 21, and 26 all occur at phrase boundaries of differing structural importance. This suggests that the phrase ending strategies are the most similar across all performers. Of the local maxima in the correlation coefficient plot, there are corresponding local minima in MSE at measures 11, 14, and 26. Measures 11 and 26 correspond to the final measures of the first and second halves of the piece, respectively. The combination of high correlation and low MSE in these measures reveals a strong similarity among the shapings at these phrase boudnaries. This suggests a possible rule of thumb: the more important the phrase boundary, the more similar the dynamic shaping will be across performers. Or perhaps, the more important the phrase boundary, the fewer the number of musically acceptable dynamic shapings possible. |
||||||||||||||||||||||||||||||||||
Conclusion The examination of dynamic shaping in recordings of unaccompanied Bach proved to be an illuminating one. The extraction of loudness curves from recordings allows for a more detailed and objective analysis of dynamic shapings. Qualitative comparisons showed a wide variety of dynamic shapings on a local level, but commonalities in the overall trajectories on a more global level. Quantitative analysis, carefully interpretted, revealed some interesting trends. Most notably, dynamic shapings appeared to be most highly correlated at phrase boundaries. In addition, minimum MSE values were obtained at the most structurally important phrase boundaries, suggesting that the structural importance of a phrase boundary could constrain the number of musically acceptable dynamic shapings a performer may choose from. |
||||||||||||||||||||||||||||||||||
|
References Kabal, P. An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality. TSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, May 2002 (updated Dec. 2003). Langner, J., & Goebl, W. Visualizing Expressive Performance in Tempo-Loudness Space. Computer Music Journal, Vol. 27 No. 4, Winter 2003, pp.66-83. Langner, J., Kopiez, R., Stoffel, C., Wilz, M. Real-time Analysis of Dynamic Shaping. In Proceedings of the International Conference on Music Perception and Cognition, 5-10 Aug. 2000. Painter, T., Spanias, A. Perceptual Coding of Digital Audio. In Proceedings of the IEEE, Vol. 88, No. 4, April 2000. Repp, B. A microcosm of musical expression: II. Quantitative analysis of pianists' dynamics in the initial measures of Chopin's Etude in E major. J. Acoustical Society of America, March 1999. Timoney, J., Lysaght, T., Schoenwiesner, M., MacManus, L. Implementing Loudness Models in Matlab. In Proceedings of the 7th Conference on Digital Audio Effects (DAFx), 2004.
Recordings Heifetz: Bach Sonatas & Partitas. The Heifetz Collection Vol. 17. Recorded in 1952. Menuhin: Bach Sonatas & Partitas for solo violin. EMI Classics. Recorded in 1934-1936. Milstein: Bach Sonatas for Unaccompanied Violin. EMI Classics. Recorded in 1954-1956.
|
||||||||||||||||||||||||||||||||||