Jonathan Mooser

Final Project: Java Beat Detector

Thursday, April 28, 2005

 

 

 


 

[ Description (below) ][ Java Source Files ][ Sound Samples ]

The purpose of this project is to build software that will automatically detect beats in a piece of recorded music using an algorithm based on Eirc Scheirer’s paper Tempo and beat analysis of acoustic musical signals [1]. It reads in a wav files, specified by a command line argument, and plays back the music with a click inserted wherever it finds a beat. When successful, the result it like listening someone snap their fingers in time with a piece of music.

Although based mainly on Scheirer’s paper, I also borrowed algorithms from two additional sources. The first is Beat This: A Beat Synchronization Project [2], a web site at Rice University that implements an algorithm similar to Scheirer’s using MATLAB. The second is Beat Detection Algorithms [3], a web site run by Frederic Patin. This second web site describes a far less sophisticated approach to beat detection than my other sources, but some of its ideas proved useful in implementing my program. The details of my algorithm and how I developed it are described below.

 

 

1. Input Format

 

For simplicity, I restrict the audio input to 8-bit single-channel wav files sampled at 22050 samples/second. The files I used for testing were all 20 second clips cut from larger recordings. In principle, larger clips could be used, but memory limitations might cause the program to crash.

 

 

2. Frequency Analysis

 

The first step is to break the recording into 1024 sample frames and apply a Fast Fourier Transform (FFT) to each.  This process uses a sliding window, that advances 256 samples at a time, so that each chunk overlaps with 6 others. This is probably the single largest departure from the methods described in [1] and [2]. Scheirer uses a different type of frequency analysis called an elliptical filter, then smoothes the results by convoluting with a half-hanning window.  The idea of analyzing frequency one chunk at a time comes from [3]. This allows us to easily find local peaks in the power spectrum at a resolution of 256/22050 = 11.6ms. The chunk length of 1024 means we can frequencies as low as 22050/1024 = 21.5 Hz.

 

Within each chunk, the FFT output is summed into a small number of bands. Also at the suggestion of [3], the bands are of differing sizes, with the lower bands being smaller. Through trial and error, I chose 9 bands with upper limits of 100Hz, 200Hz, 300Hz, 400Hz, 600Hz, 800Hz, 1200Hz, 2400Hz, and 11025Hz. The sample rate of the input  determines the upper limit of the highest band.


3. Differentiation

 

The Differentiation process takes the output of step 2 and finds large increases for each band. The band energy at each time is subtracted from the band energy at the next time. Those results are then half-wave rectified, meaning that all values less than zero, are clipped to zero.

 

This differentiation data is the basis of the actual tempo finding process. For each time, measured in 256 sample frames, we have an approximation for the energy increase for each band. A large energy increases in a given band at a different time indicates a high probability of finding a beat at that time.

 

 

4. Beat Finding

 

The program searches for beats beginning at a random point near the beginning of the input. Within a certain window, the differentiation values are summed across bands and the local maximum is taken to be an estimate of the first beat. This first guess may not be very good, but combined with the comb filter (step 5), the estimates will hopefully become successively better at the search progresses through the input.

 

 

5. Tempo Estimation by Comb Filter

 

Once the time estimate for a beat has been found, the program estimates the period of the current tempo be searching backward from that point in time. The comb filter implemented here is somewhat simpler than that described in [1] or [2]. For a given tempo estimate T, at time t, my program simply find a sum across bands of the energy at times t-T + t-2T + t-3T…. etc.

 

Having an estimate of the period T and time t, the next beat estimate is taken to be time t+T. The program then returns to step 4, now searching a window centered around that new time estimate. That peak is taken to be the second beat and the process continues until it reaches the end of the input.

 

Neither [1] nor [2] search for energy peaks directly. Rather, they both apply a comb filter for all possible periods over the entire length of the input signal. Both of those systems, however work with input about 5 seconds long. Trying to work with much longer inputs (>20 seconds) such an exhaustive search is not possible. It makes more sense to perform the comb filter over a small window of time and use the results as an estimate of tempo which may vary.

 

6. Click Insertion

 

For every beat, the program inserts a click into the audio signal. It plays back the results, allowing a listener to evaluate how close the estimated beats are to the actual beats.

 


7. Results


I tested the system on four musical excerpts, each about 20 seconds long. Two of them, Let Forever Be, by The Chemical Brothers and Amateur by St. Etienne, would generally be classified as Techno. The results on these pieces were mixed, with the program correctly identifying the beat about 50 per cent of the time. A more difficult example was the jazz recording Descarga by Tito Puente. Although the music has a strong beat, the percussion is often erratic, thus confusing the program. The best results by far were those applied to an excerpt from the second movement of Beethoven’s 9th. Several authors have observed that classical music is the most difficult to track, but that was not the case here.


 

In all cases, the program seemed to act more as a rhythm tracker than a beat tracker. Because it always marks beats on an onset, the program will miss beats that do not coincide with a strong onset. It will, however, often find repeated rhythmic patterns that may be off the beat.

 

Overall, the program is a good start, and may become very accurate with further refinement.

 

 

References

[1] Tempo and Beat Analysis of Acoustic Musical Signals,  Eric Scheirer. J. Acoust. Soc. Am. 103 (1), January 1998.

 

[2] Beat This: A Beat Synchronization Project, Cheng, Nazer, Uppuluri, Verret. http://www.owlnet.rice.edu/~elec301/Projects01/beat_sync/beatalgo.html

 

[3] Beat Detection Algorithms, Frederic Patin

http://www.gamedev.net/reference/programming/features/beatdetection.

 

 

 

 

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees