Music is not just entertainment, it’s a reflection of our emotions, habits, and even our mental health. Something, we all universally rely on, irrespective of the genre and language choices. In this talk, I’ll demonstrate how open-source Spotify audio feature datasets can power reproducible research and FOSS projects at the intersection of music, well-being, and technology.
I’ll show how we can extract meaningful insights about stress and recovery using only open data. Each song in this dataset, sourced from Spotify’s Web API, includes detailed audio features such as Danceability, Energy, Loudness, Mode, Speechiness, Acousticness, Instrumentalness, Liveness, and Tempo, enabling deep analysis of musical characteristics and their psychological impact.
Building on robust clinical and research-backed threshold validation, I’ll explain how we mapped these audio features to stress levels, using scientifically validated thresholds to classify tracks as “Stressed”, “Not Stressed”, or “Borderline”. I’ll also share our open-source data augmentation pipeline, which addresses class imbalance using hybrid synthetic generation techniques like Gaussian Noise Injection and Boundary Based Sampling, which ensures robust, fair results.
Finally, I’ll introduce how this approach can also be extended to projects like a graph-based music recommendation system that leverages these open data insights to deliver personalized, cold-start-resistant recommendations demonstrating how anyone can build impactful projects with open music data.
This talk is inspired from my research that I published. I have filed a patent on the extension of this project, and want to help the community understand how open data can power niche research that has exponential impact.
Large, high-quality datasets, like the 15,150+ track Spotify audio features dataset are openly available and can be used by anyone for research, product development, or community projects.
Scientifically validated links exist between musical audio features (e.g, danceability, energy, tempo) and psychological states such as stress.
Open data enables transparent, reproducible research into how everyday habits like music listening impact well-being. An insight that is universally useful, across most demographics.
How to do rigorous, reproducible methodology, an example to demonstrate that is how, thresholds for classifying stress levels from music were derived from peer-reviewed clinical studies, ensuring scientific validity.
Preprocessing and data augmentation techniques (e.g., synthetic data generation) are essential for handling real-world dataset challenges like class imbalance.
Anyone can use open Spotify datasets and APIs to build their own music analytics, emotion recognition, or recommendation projects, no proprietary barriers.
All tools, code, and data in this work are FOSS-licensed, making them easy to adopt, extend, and remix.
Most healthcare projects are not open source, and are often steered away due to lack of data. Open music data can inform public health, support mental well-being, and foster new interdisciplinary collaborations between technologists, researchers, and the broader community.
This doesn't appear to be a FOSS project just scraping the Spotify API
Sorry, the paper looks neat, but this doesn't fit the track.