Genre Classification of Electronic Dance Music Using Spotify's Audio Analysis – Towards Data Science

June 4, 2024

Sign up
Sign in
Sign up
Sign in
Travis Wolf
Follow
Towards Data Science

1
Listen
Share
Electronic dance music has an epic origin story. It began as DJ’s dubbing over disco records with drum machines and playing them in illegal warehouse parties. What we know as EDM today, has grown into a massive $750 million industry. As popularity increases, more and more listeners are going to become exposed to genres and sub-genres they have never heard before.
Discovering new music has also never been easier with streaming services like Spotify introducing personalized song recommendations using machine learning algorithms. Hearing a song for the first time and diving into a new genre can be the most enjoyable part of being a listener. Finding yourself in a rabbit hole of fun, whimsically adding more and more songs to your saved songs. Potentially leading to a disorganized and cluttered Spotify library.
This project explores the application of machine learning algorithms to identify and classify the genre of a given song using Spotify’s audio analysis. Allowing users to automatically organize their library by genre.
All code for this project can be found here: https://github.com/towenwolf/genre-classification
Contains audio analysis of 21,000 songs of 7 unique categories of electronic dance music: tech house, techno, trance, psy-trance, trap, drum & bass, and hardstyle. The songs were gathered from 136 user created playlists, from credible sources like verified record labels and official Spotify playlists.
To obtain the data set I wrote a script on python that acquires credentials from the Spotify’s developers app, reads a csv file of playlists, retrieves the audio features of each song in the playlist, and creates a data frame of songs labeled by genre. The script also removed any duplicates, low tempo songs below 50 bpm, and songs longer than 16 minutes. The final data set randomly sampled 3,000 songs of each category for analysis.
Each row of the data frame represents a song, and the columns are audio features of the songs measuring:
· Acousticness — Probability of a song being purely acoustic vs synthesized. Acoustic recordings of songs will have values close to 1.
· Instrumentalness — Probability of a song containing no vocals. Purely instrumental songs will have values closer to 1.
· Speechiness — Probability of a song containing only speech. Spoken word tracks and vocal intros will have values close to 1.
· Danceability — How danceable a song is ranging from 0 to 1.
· Liveness — Detects the presence of an audience in the recording, ranging from 0 to 1.
· Valence — Mood of a song. Happier sounding songs have a value closer to 1, sadder songs closer to 0.
· Energy — Perceptual measure of intensity and activity. High energy tracks feel fast, loud, and noisy and will be close to 1.
· Key — The estimated overall key of the track.
· Loudness — The overall loudness of a track in decibels (dB).
· Mode — Indicates the modality (major or minor) of a track, major is represented by 1 and minor is 0.
· Tempo — The overall estimated tempo of a track in beats per minute (BPM).
· Time Signature — An estimated overall time signature of a track.
· Duration — The duration of the track in milliseconds (ms).
The advantage of using Spotify’s API to construct the data set instead of analyzing raw audio is that is saves time and computational power, especially with a large data set. To learn more about how to use Spotify’s API check out this article.
To find features that will be useful to classify songs by genre, we should listen to the differences… are some songs faster/slower than others? Since we don’t have time to listen to every song in the data set. Let’s visually inspect the differences by looking at the histogram of tempo distribution for each of the genres.
We see a cluster of songs in the lower tempos especially for the genres of trap, drum & bass, and hardstyle. I suspect the audio analysis returned a half-time tempo instead of the full time. This is not a problem, considering half-time is musically correct. It’s just a change of perspective of how many beats per minute are played, counting only the first and third quarter notes rather than one, two, three, four. For example, a 140 bpm song will be 70 bpm in half-time, which is common for certain genres. To fix this, I wrote a function that changes any tempo returned as half-time to be changed to full time to keep the data consistent.
Another potential issue that we run into when using user-created playlists is adding songs to the playlist that do not belong to that genre, leading to a mislabeled observation. For example, a user could have by mistake put a rap song in a house playlist. Looking at description statistics of tech house tempo, we can see the min and max values are well above and below the mean.
These data points that are far from the mean tempo tell us that they are not actually part of the genre tech house. If we feed the algorithms mislabeled observations like this, it can lead to problems for the classifiers to correctly predict. So to fix it, I set a threshold on the data with genre’s known tempo according to Abelton’s music production chapter on tempos.
After adjusting for user’s mistakes and changing half-time tempos to full, the distribution of genre’s tempos show unimodal structure around their means and equal variances. Statistically speaking it is important for the data to have these characteristics, because a lot of the algorithms use linear models that assume normality. This means tempo will have a lot of weight when predicting what genre a song might be.
Now besides tempo, what other audio analysis features could help add to predicting the correct genre?
If you look at the histogram for danceablility (Figure 7), we see that tech house and techno are on average more “danceable” than all of the other genres. This comes to no surprise, house and techno are known for their hi-hats and claps that keep dancers heads nodding. The structure of the data is normal and the means of the genres are different from one another, suggesting this would be another helpful feature for classification.
Observing the differences in overall loudness of the songs (Figure 12) gives us some insight into differences of the genres as well. Trap, drum & bass, and hardstyle are all recorded at a very high decibel, revealing the intensity and power that these genres can output on loud speakers. Don’t turn your headphones up too loud when listening to these genres.
The duration of song (Figure 15) also plays a role in what genre it could be a part of. Psy-trance and techno songs are much longer than the other genres with upper 75% of the data ranging between 6–15 mins per song.
Features that stand out as not viable for analysis are key, mode, and time signature, which are all categorical features related to musicality of the song. They might be useful in a later analysis, but for this project they will be removed from feature selection.
To train the classification models we need to divide the data into features and labels. The genre labels are encoded using the Label Encoder from sklearn.
Then the data is split into training and validation sets using sklearn’s train_test_split, randomly splitting the data into 80/20% chunks that will be used for training and validation of the classifiers.
All of the features will transformed to the same scale between 0 and 1 using sklearn’s MinMaxScaler.
Now that the data is prepared, the classifiers are to be trained and predict the probabilities of each genre. The classifiers are algorithms or “machines” that use statistics to output a mathematical model fitting the data we enter into it. Allowing us to predict useful things like what genre a song is! We are not going to dive into the math of each of these algorithms in this article, but will provide a short explanation of what makes these classification algorithms candidates for this use case.
Is a linear mathematical model, normally used for binary classification. In this case we have 7 categories and will implement the One vs Rest method, which will train 7 separate models and the one with the highest probability will be the predicted category.
Is an ensemble classifier that fits a number of decision tree classifiers. Optimal parameters were found using sklearn’s GridSearchCV.
Boosting is another ensemble classifier that is obtained by combining a number of weak learners (such as decision trees). Optimal parameters were found using sklearn’s GridSearchCV.
Classic classification algorithm, commonly used in binary classification. A one-vs-rest approach will be used just like the logistic regression.
Accuracy: 84%
F1 Score: 0.84
ROC AUC: 0.974
Wall time: 543 ms
Not bad for a simple classifier, the genres that have the most correct predictions are tech house and psy-trance.
Accuracy: 93%
F1 Score: 0.93
ROC AUC: 0.994
Wall time: 40.6 s
Random Forest performed significantly better. Suggesting decision tree classifiers are a good method for classification using this data.
Accuracy: 93%
F1 Score: 0.93
ROC AUC: 0.994
Wall time: 52.2 s
Scored nearly the exact same as random forest, with only minor changes in the predictions. There is a clear structure of how accurate the predictions are.
Accuracy: 88%
F1 Score: 0.88
ROC AUC: 0.986
Wall time: 16.1 s
Relatively high accuracy for most genres except for techno and hardstyle. Falsely predicting the two for one another.
Another way to evaluate the performance of the classifier algorithms. Illustrates the trade off of specificity and sensitivity, the more area under the curve, the more accurate the predictions.
XGBoost and Random forest algorithms perform the best for predicting the genre of a song. All of the algorithms perform well when classifying tech house, with near perfect accuracy on all classifiers. Some instances of techno are often confused with hardstyle and vice-versa, as well as trap for psy-trance. This suggests that the differences between these genres might be more subtle than what the audio features are able to detect.
Both Random Forest and XGBoost’s performance is promising for the ability to automate classifying genres on a large scale. Being able to determine what genre a song is based off of features audio analysis could help listeners organize their library or recommend similar songs.
But at the end of the day, what truly makes each genre unique is the feeling and emotion it provokes while listening. Take a listen for yourself, here are links to Spotify playlists used in this analysis.
Moving forward I would like to find the minimum number of songs for the algorithm to effectively classify a genre. So that way the scale of the number of genres could include a large number of niche genres. If enough playlists exist for a certain genre, then the classifiers would even be able to predict even the wildest genres, like Russian-Polka-Hardcore (see below). So get out there and make some public playlists of your favorite genre.
arxiv.org
medium.com
learningmusic.ableton.com
spotipy.readthedocs.io
towardsdatascience.com
stackoverflow.com


1
Towards Data Science
MS Neuroscience and Cognition | Data Science
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams

source

Leave a comment

Your email address will not be published. Required fields are marked *