Analysis of Dynamic Shaping in Unaccompanied Bach

By Eric Cheng

Project Goal: The purpose of this project is to compare and contrast the dynamic shaping used by some of the greatest violinists of the 20th century: Menuhin, Milstein, and Heifietz. Analysis is conducted using the Andante movement from Bach's Sonata No. 2 for unaccompanied violin. Unaccompanied Bach is a natural choice for dynamic analysis: there is only a single instrument to be analyzed, and there are basically no dynamic markings in the score, leaving much room for interpretation by the performer. The Andante movement was chosen because the clarity of the melodic line and the regular underlying eighth-note pulse were ideal for analysis.

Introduction

Musical dynamics are a crucial expressive tool in music performance, conveying both emotion and musical structure. By controlling the evolution of loudness, musicians can create tension or relieve it, signal an end or a beginning, or simply express a particular emotion. In the field of music performance research, musical dynamics pose an interesting challenge, because the dB sound levels recorded in an audio file do not correspond directly with perceived loudness. This is because perceived loudness depends on more than just the amplitude of the waveform. The temporal and spectral contexts in which sounds are heard profoundly affect perceived loudness. Thus, in order to accurately examine the dynamic shaping of a given audio recording, we must process the raw waveforms to transform them from amplitude curves to loudness curves.

Loudness Curve Extraction

In order to extract loudness data from raw sound files, we must take into account psychoacoustic principles such as spectral and temporal masking as well as frequency content.

Spectral Masking

Imagine listening to a 1000Hz sound at 70dB. This sound will have a certain perceived loudness. Now imagine that in addition to the original sound, we play a 1050Hz sound at 70dB. Will the 1000Hz sound still be as loud? The answer is no, and the reason is spectral masking. When two sounds are similar in frequency, each sound will mask, or lower, the perceived loudness of sounds in the spectral vicinity. This occurs because sounds alter the threshold of audibility of surrounding frequencies, as shown in the graph to the right:

 

source: http://audiostuff.mysite.wanadoo-members.co.uk/is.html

Temporal Masking

Now imagine listening to the same 1000Hz sound at 70dB, this time played immediately after the 1050Hz sound. Will it sound as loud as the 1000Hz sound alone? The answer is no again, this time because of temporal masking. Just as sounds mask other sounds in the spectral domain, they also mask sounds in the temporal domain too. This means that a sound heard immediately following a loud sound will be masked, or harder to hear. The interesting twist is that sounds played immediately before loud sounds are masked as well. It's as if our brains are able to see into the future... (see graph to the right)

 

source: Painter (2000)

Procedure for Extracting Loudness Curves

The basic process of extracing loudness curves from audio files can be summarized in the following steps:

1. Slice the raw waveform into frames or windows, usually around 2048 samples wide, or 46msec at a 44.1kHz sampling rate.

2. Transform each frame to the frequency domain.

3. Analyze the resulting spectrum according to the critical band resolution of the ear.

4. Take into acount temporal and spectral masking.

5. Convert the dB levels into loudness levels using steps 3 and 4.

Luckily, there are Matlab implementations which will do this for you. For this project, I implemented Matlab code provided by P. Kabal from McGill University. The code is based on Perceptual Evaluation of Audio Quality (PEAQ), and is available here. The result is a single loudness value for every frame in the waveform.

Mapping Loudness Curves to Metrical Position

After carrying out the above procedure, we will end up with a loudness curve like the one shown in Figure 1.

But this graph is not very useful for comparing dynamic shapings of different performers. We don't know which frames to compare since every performer played with a slightly different tempo. In order to compare dynamic shaping, we need to map the above loudness values to their metrical position in the score. Loudness at a given metrical position is independent of tempo, and can be compared across performers.


Figure 1


Figure 2

To do this, we need to know the onset times of every beat. I wrote a program called marker.m that allows us to accomplish this task, though somewhat crudely. The program is essentially a stopwatch. To use it, you play the recording, start the program in Matlab, and tap the return key along with the beat in the recording. Every time the return key is tapped, the time is recorded into a vector, giving us the onset times of each of the notes. Of course, the stop watch is not started exactly when the recording is played back (since playback must be done using another application), so a slight time shift will be needed to fit the beat track to the piece. A program called playback.m will take the beat track and the recording and play them back to confirm the accuracy of the track.

Once we have obtained the onset times, we can determine what frames those correspond to, and therefore find the loudness values. To account for a lack of accuracy in the beat track, and for the limited ability of a musician to control dynamic shaping over small time intervals, loudness values were averaged (smoothed) with the values within a certain time window surrounding the onset time, determined by the user. The smaller the time window, the more local variations can come through, the larger the time window, the more global the analysis. Shown below in Figure 2 is an example loudness curve, with each point separated by a metrical distance of one eighth note. Accompanying sound file is available here.

 

Results - Qualitative

Many interesting results came from the analysis of the loudness curves. Milstein's dynamic shaping in the first half of the piece stands out against the rest. Particularly in bars 3-8, the trajectories drawn by his dynamic shaping are much longer in scope, with fewer local peaks and valleys. This results in a performance with a more global sense of line, and we get the feeling that Milstein is thinking of every note he plays within the context of the entire piece as opposed to within the context of a given bar or phrase. This is shown most clearly when contrasted with the shaping of Menuhin as shown in Figure 3 below. Menuhin's trajectory never maintains a constant direction of movement for more than 5 eighth notes, while Milstein creates an upward trajectory that spans nearly two measures beginning in the middle of measure 4.

 


Figure 3


Figure 4

Indeed, when we examine the loudness curves with loudness values smoothed over a window length of a bar, the global shaping of Milstein (in green below) becomes more apparent. He continues to draw arching trajectories with higher peaks and lower valleys while the curves of Heifetz and Menuhin are somewhat more flattened by the added smoothing.

Though the trajectories above contrast in several ways, they do share a similar overall shape, with a dip in dynamics around bar 5 leading up to the dynamic climax of measure 7. In all cases, the performers ended the first half of the piece with nearly two measures of decrescendo, ending at the lowest point of the entire first half.

Results - Quantitative

Several quantitative measures were calculated for comparison. I believe these numbers should be interpreted cautiously, however, since music is meant to be listened to, and perceived by humans. If we cannot aurally confirm the trends or insights gained by quanititative analysis, then they are of questionable value. As a measure of dynamic variability, the standard deviations are shown below. The dynamic range is taken to be equal to 4*std.dev., following the convention of Repp (1998). All values are in loudness units of Sones.

 
Menuhin
Milstein
Heifetz
Std. Dev.
7.75
9.01
9.96
Range
31
36.04
39.84

These numbers were confirmed to a certain extent upon listening to the recordings. The standard deviation measure does not appear to capture the extent to which Menuhin's dynamic shaping varies more locally than that of Milstein's. This could simply be a function of the recording, since Menuhin's recording was made in 1934, some 20 years before the others. Heifetz greater dynamic variability and range were confirmed upon listening.

Additionally, as a measure of similarity between shaping strategies, correlation coefficients were calculated and are shown below:

Correlation Coefficients

 
Menuhin
Milstein
Heifetz
Menuhin
1
.6669
.6848
Milstein
 
1
.6821
Heifetz
   
1

The correlation coefficients are all positive, and nearly identical. However, it is interesting to note the relatively low correlation coefficient between Menuhin and Milstein, whereas Heifetz has a relatively high correlation coefficient with both Milstein and Menuhin. Indeed, visual inspection of the loudness curves suggests that Heifetz's shaping strategy borrows elements from both Milstein and Menuhin. While Heifetz employs greater local dynamic variation similar to that of Menuhin, he does so while employing greater global arching to his trajectories, similar to Milstein. This can be seen in Figure 4, where Heifitiz seems to occupy the middle ground between Milstein and Menuhin in terms of global arching.


Figure 5

 


Figure 6

In order to gain insight into the existence of any general shaping strategies common to all performers, the average mean squared error and correlation were calculated for each bar in the piece. Figure 5 shows the averaged correlation coefficients. The local maxima occuring in measures 4, 8, 11, 14, 19, 21, and 26 all occur at phrase boundaries of differing structural importance. This suggests that the phrase ending strategies are the most similar across all performers.

Of the local maxima in the correlation coefficient plot, there are corresponding local minima in MSE at measures 11, 14, and 26. Measures 11 and 26 correspond to the final measures of the first and second halves of the piece, respectively. The combination of high correlation and low MSE in these measures reveals a strong similarity among the shapings at these phrase boudnaries. This suggests a possible rule of thumb: the more important the phrase boundary, the more similar the dynamic shaping will be across performers. Or perhaps, the more important the phrase boundary, the fewer the number of musically acceptable dynamic shapings possible.

Conclusion

The examination of dynamic shaping in recordings of unaccompanied Bach proved to be an illuminating one. The extraction of loudness curves from recordings allows for a more detailed and objective analysis of dynamic shapings. Qualitative comparisons showed a wide variety of dynamic shapings on a local level, but commonalities in the overall trajectories on a more global level. Quantitative analysis, carefully interpretted, revealed some interesting trends. Most notably, dynamic shapings appeared to be most highly correlated at phrase boundaries. In addition, minimum MSE values were obtained at the most structurally important phrase boundaries, suggesting that the structural importance of a phrase boundary could constrain the number of musically acceptable dynamic shapings a performer may choose from.

References

Kabal, P. An Examination and Interpretation of ITU-R BS.1387: Perceptual Evaluation of Audio Quality. TSP Lab Technical Report, Dept. Electrical & Computer Engineering, McGill University, May 2002 (updated Dec. 2003).

Langner, J., & Goebl, W. Visualizing Expressive Performance in Tempo-Loudness Space. Computer Music Journal, Vol. 27 No. 4, Winter 2003, pp.66-83.

Langner, J., Kopiez, R., Stoffel, C., Wilz, M. Real-time Analysis of Dynamic Shaping. In Proceedings of the International Conference on Music Perception and Cognition, 5-10 Aug. 2000.

Painter, T., Spanias, A. Perceptual Coding of Digital Audio. In Proceedings of the IEEE, Vol. 88, No. 4, April 2000.

Repp, B. A microcosm of musical expression: II. Quantitative analysis of pianists' dynamics in the initial measures of Chopin's Etude in E major. J. Acoustical Society of America, March 1999.

Timoney, J., Lysaght, T., Schoenwiesner, M., MacManus, L. Implementing Loudness Models in Matlab. In Proceedings of the 7th Conference on Digital Audio Effects (DAFx), 2004.

 

Recordings

Heifetz: Bach Sonatas & Partitas. The Heifetz Collection Vol. 17. Recorded in 1952.

Menuhin: Bach Sonatas & Partitas for solo violin. EMI Classics. Recorded in 1934-1936.

Milstein: Bach Sonatas for Unaccompanied Violin. EMI Classics. Recorded in 1954-1956.

 

Matlab Code

PEAQ Loudness Toolbox

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees