It’s a growing problem: a dizzying number of songs get released to online music stores and streaming services or uploaded to archives around the world each day, and those songs need to be categorized. But how? Play the same song to 10 people and they might each put it into a different genre or subgenre. An automated genre identification system developed by researchers in India, which they claim is the best yet, could be the answer.

The system, created by a group led by Arijit Ghosal of the Neotia Institute of Technology Management and Science, is predicated on the idea that musical genres are characterized by pitch, tempo, amplitude variation patterns (changes between loud and soft), and periodicity (how or to what extent the music repeats phrases). Most major genres can be identified by pitch analysis – which reflects the melody – alone, but including the others makes for more accurate readings.

To analyze music for common pitch features, the system breaks it down into 88 frequency bands, each divided into short frames for which they calculated something called the short-time mean-square power (a kind of measure related to the sound wave’s voltage and current), both individually and as an average. For tempo, it starts with a novelty curve, which follows changes in the song’s timbre, or tone color – so, basically when the instrumentation changes. It then performs a Fourier transform, which deconstructs the song’s sound wave into many sine curves, each corresponding to a different frequency, which can be further analyzed to get the beats per minute.

For amplitude variation patterns, the signal is smoothed and then mathematical matrix operations are performed on it to get the equivalent of the signal’s texture. While for periodicity, it divides the signal into frames of 100 samples each and calculates cross correlations between them. The system then takes the maximum cross correlation of each frame and uses it to calculate mean and standard deviations.

All of this information gets fed into a classification scheme. The researchers tested their method with three classifiers – multilayer perceptron (MLP), which is an artificial neural network that consists of multiple layers of neuron-like things called perceptrons; support vector machines (SVMs), which use machine learning and a set of training data; and random sample consensus (RANSAC), which makes a hypothesis based on a randomly-selected sample set and then verifies it against the model (and iterates through the data, taking the best fit estimate as the final one).

RANSAC outperformed the other two classifiers in both feature sets developed from a database of 490 songs in seven different genres. And the methodology used also proved more accurate – or in their words, “substantially better” – than different approaches used in previous studies, when tested on the same data.

The researchers believe that their genre identification system could be easily incorporated into existing music databases and recommendation services.

A paper describing the research was published in the International Journal of Computational Intelligence Studies. An earlier version of the same system was described in a paper presented at the International Conference on Advanced Computing, Networking, and Informatics in 2013.


Henry Sapiecha