Data Analysis
A story in data points: a unique view of over 11,000 Canadian artists' Spotify statistics
Getting Started

The following section dives into the main portions of this study, where I examine over 11,700 Canadian artists’ Spotify metrics. You'll find visualizations of various trends in listenership, fan engagement, and artist characteristics like gender-identity and genre pulled from an online music database. You’ll get a peek into the process of data collection and reasoning behind my interpretations of the graphs I’ve visualized with Python. Each visual is interactive, so I encourage you to play around with the data! Zoom in or over hover points to see their values, check out what kinds of trends you find for yourself.

Where did the data come from?

The data I use in this project is pulled from Chartmetric, a database that gathers artist information from a variety of platforms such as Spotify, YouTube, Bandsintown, and Facebook. This database graciously allowed me to download CSVs of large amounts of information for approximately 11,700 Canadian artists around Canada, with information from Spotify such as the number of monthly listeners, followers, and playlisting trends (on an individual basis). I was also able to access many other variables such as genres, pronouns, and whether the artists were bands or solo projects. Click through the tabs below for explanations of each of the metrics I’ll be discussing.

Spotify’s tally of how many unique Spotify users have listened to the artist over a period of 28 days. A user must play at least 30 seconds of a track to count as a listener. Even if a user has listened to the same artist more than once, they are still counted as one monthly listener.
Spotify’s count of how many Spotify users are currently following the artist.
Chartmetric pulls genres associated with each artist from Spotify, MusicBrainz and EveryNoise. For the purpose of this study, I’ve grouped together certain genres to reduce the number of individual categories in my charts (ex: combining those deemed “Rock” and “Punk & Metal” into a category together titled “Rock, Punk & Metal”, or if multiple geographic locations were attributed to an artist, grouping them into “World”).

Chartmetric’s list of pronouns, created through both automated and manual collection. It should be noted that the collection and identification of artist/group pronouns is incredibly difficult and as a result there are still issues with terms used (ex: using “they/them” for non-binary artists as well as groups with multiple members), or the fact that not all pronouns equate to gender identity. To read more about Chartmetric’s pronoun database, check out their blog!

Chartmetric’s documentation of whether the act is a single artist (solo), or if it is a musical group.

Why do we care about visualizations?

There are three main graphs I’ve chosen to include in this study: a 2D scatter plot of Spotify monthly listeners versus followers, a 3D scatter plot of Spotify monthly listeners, followers, and genres, and a 2D scatter plot of monthly listeners versus followers facetted by pronouns. The general purpose of visualizing this data in this project is to look for trends that could help us better understand what patterns are still perpetuated by the music industry—patterns that perhaps transcend the shift in technology, carried over from radio to current day streaming. We may also observe differences in fan engagement between genres, or perhaps between men, women, and non-binary representation that can be related back to what is now becoming an increasingly open market for music creation and distribution. Through making data accessible in these visuals, we create more opportunities for those of different domains and varying degrees of background knowledge to contribute to the discussion. Combining data science with the arts allows us to paint a more complex, interesting, and complete picture of those that are represented and affected by our findings.

Monthly Listeners vs. Followers

The first approach I took to exploring my dataset of Canadian musicians was to check out the relationships between number of monthly listeners and followers per musician or group. Chartmetric provides data that represents this relationship, which they term “fan conversion rate”. Chartmetric’s fan conversion rate is calculated by taking the number of followers an artist has and dividing that by the number of monthly listeners. Since not every follower may be actively listening to an artist, and not every listener a follower, this conversion rate aims at quantifying how engaged an audience is. 

If the conversion rate is higher, this could indicate that an artist is converting one-off listeners to followers that will come back for more. The benefit of having more listeners converted to followers (and consequently a high conversion score) is a stronger basis of success for future releases, as an engaged audience is more likely to pay attention to new releases by adding them to playlists and further circulating an artist’s discography. A lower conversion rate would mean that an artist has far more passive than active fans; having more listeners who are not engaged enough to follow said artist online and thus aren’t guaranteed to be around for future releases. This phenomenon can happen for a variety of reasons, such as a hit song being circulated on playlists with many followers but few engaged fans who are willing to go find an artist post-playlist, or success for a snippet of a song on apps like TikTok or Instagram.

The first plot I created illustrates the Spotify fan conversion rate clearly, as the y-axis contains the monthly listener count, and the x-axis contains the follower count as of February 2022 (when the data was pulled). The legend on the right side is a colour gradient to further help show how condensed the range of success on Spotify truly is for Canadian artists. The outliers are household names, Justin Bieber, Drake, and the Weekend, while the vast majority or artists are contained within the bounds of 3 million monthly listeners and 9 million followers. If we scale further down, we see that smaller (but still entirely notable) Canadian names such as City and Colour, are displayed as the new outliers for the condensed dark blue section of artists between 400,000 monthly listeners and 100,000 followers. 

Hover your mouse around or select and drag parts of the chart to play with the data. Do you see any familiar names? How many monthly listeners compared to active followers did they have in February of 2022?
Sidebar: Context is important!

After calculating the median for the total musicians’ conversion rates, they typically fall around approx. 0.36 (or 36%), less than a 50% conversion rate. This means that an average Canadian artist would have 1 follower for every 3 listeners. And while this sounds like a fair conversion rate, without knowing how many followers or listeners are in question, we wouldn't be able to truly gauge success on a platform like Spotify. This brings into question the need for context of exposure.

Exposure is illustrated when we look at our plot graphs not by conversion rate solely, but instead by followers and listeners (like the graph above), which in turn make up a conversion rate as a point. A concrete illustration of the important context exposure and circulation gives us to understanding these graphs, is the third quartile calculated (ie. 75% of artist’s conversion rates falling under this quartile). This quartile shows a 1.04 conversion rate (104% conversion rate). But if we recall what this rate represents, this means that certain artists have 26 followers for 25 listeners. So, above the threshold of a 1:1 conversion rate, the inverse effect takes hold of artists, with more followers than listeners! Maybe they have inactive fans who aren't listening due to their own inactivity or an outdated discography.

Context matters, it's not just about looking for the largest numbers!

Genres vs. Monthly Listeners (vs. Followers)

The next visual I mapped out is one made by sorting artists using monthly listeners and their genres. Originally, the genres needed cleaning, and so I used OpenRefine (an open source data cleaning application) to group certain sub-genres into their respective categories. The categories seen in the chart below are the same as those used in Chartmetric, however when pulling the data there were many countries (such as “Australia”, or “Zimbabwe”) listed as genres alone, which I then grouped into the “Folk, Traditional, ‘World’” genre in order to preserve readability for the final graph. I want to acknowledge here that with more time, I could have perhaps split the graph into a facet just for these geographic locations, so as not to group them together and potentially reduce their significance.

The purpose of mapping out monthly listeners by genre is to possibly link typical trends in listenership to their genres and see how this translates to the Canadian music industry. To add another layer of data to this chart, I’ve added in the ability to identify which artists or bands are groups, solo projects, or remains undetermined via Charmtertic’s database.

Check out the legend to see if you observe relationships between the identified genre categories and under-represented projects. What genres have the fewest projects attributed to them? Which have the most? Where do outliers like Drake and The Weekend sit on our chart?

In this scatter plot we can clearly see interest in listenership of “Dance & Electronic”, “Hip-Hop & Rap”, and “Pop” genres. If we consider the topmost-liked Spotify playlist, Today’s Top Hits, with over 30 million likes, the majority of tracks on these widely consumed playlists are Pop, Dance, and Hip-Hop, which could be helping with the dissemination of tracks in this genre to wider audiences. On this note, I’ll highlight the lack of solo/group documentation for many of the acts under categories that lack listenership, such as "R&B, Funk & Soul, Reggae, and Latin & Caribbean.” Exploring this, we can find other works that show a similar trend in drastic underrepresentation of artists that are Black, Indigenous, and/or persons of colour (BIPOC), such as those of Dr. Jada Watson and Abigail Alty with Dalie Brisson. From An examination of gender, sexuality, race, ethnicity and nationality in Canadian Active Rock radio by Alty and Brisson, we see examples of the underrepresentation in active rock radio charts. For example, they explain how out of 249 unique artists in the Top 100 Active Rock radio charts, only 10.5% of these bands included BIPOC artists. They go on to discuss how even though this 10% of BIPOC inclusive acts is a small portion of representation in charts, the barrier of exposure is even greater for women and 2SLGBQ+ artists in their data. When we observe these trends in data of lessened listenership for select genres, or perhaps a limited amount of exposure (as we’ll see below), we can start to open a conversation about why certain artists are not gaining similar traction.

But remember what we said about the importance of context regarding our Spotify conversion rates earlier? Let’s examine the conversion rate with more detail between each genre. For this, I’ve plotted a three-dimensional map of artists by monthly listeners on the x axis, followers on the z axis, and genres on the y axis. This allows me to see clusters and intersections of genres with artists that tend to have a lower follower or monthly listener count, regardless of conversion rate. With this detail, we can see in the graph below that the same genres we were concerned about having less documentation on with respect to solo/group ID, are also the same genres that are clustered closer to the axes, meaning they have fewer listeners or followers, and thus lower exposure regardless of their conversion rate.

Use the widgets on the top right of the graph to rotate, zoom, and pan in on different angles of this 3D scatter plot. What does hovering over each point show you? Does having a third dimension show you a new perspective on our data?

The question of exposure for genres with less traction in Canada leads me to consider what the typical listening rates are for playlist on Spotify that are catered towards these genres in question, as Spotify’s playlisting algorithm is an incredibly powerful tool for small artists gathering new listeners (which hopefully turn into new followers). For example, Spotify has several large Reggae playlists, such as Reggae Classics with over 2 million likes, or Ultimate Reggae with over 106,000 likes. And while the number of Canadian artists specializing in Reggae that make it to these playlists is undetermined in my data, my hope is that these questions will prompt readers to be more critical of how many BIPOC artists or artists of underrepresented genres pop up in their algorithm-generated playlists on Spotify. If you’re interested in exploring this topic further try out this illuminating study on listenership and the diversity of Spotify’s recommender systems by Ashton Anderson, Lucas Maystre, Rishabh Mehrotra, Ian Anderson, and Mounia Lalmas, titled Algorithmic Effects on the Diversity of Consumption on Spotify

Monthly Listeners vs. Pronouns

Finally, I was able to separate my scatter plots into facets based on pronouns identified from Chartmetric’s database. I chose to examine pronouns on separate graphs because there were five categories identified in my dataset and viewing them all on one single plot didn’t me see how underrepresented certain pronouns are. A clear example of this is the single orange dot on the fifth graph; Peter Jessy (an Ottawa indie pop artist) being the only person in my dataset to have identified as “he/they”, would have been missed if we were viewing all pronouns on the same graph. The “they/them” category is also filled with mostly groups/bands, since they are not single people and grammatically are referred to as “they/them”. While statistically it doesn't add up for only one artist in a dataset of over 11,700 artists to identify as “he/they” this could be attributed to a lack of tools accessible for artists to self-identify under non-binary or other terms. I take this not as a mistake but rather as a sign that moving forward, platforms such as Spotify, or aggregating databases such as Chartmetric should strive to be more inclusive in their options for gender equity and representation. 

Fortunately, Chartmetric’s blog “Make Music Equal” touches on the fact that Chartmetric is already leaping ahead to tackle this issue by including pronouns in their datasets! Visualising data can lead us to see issues of representation, which allows us to decide how exactly we want to start reworking the way we look at, document, and visualise this data in the future. As Chartmetric indicates on their post titled Pronouns & Gender Dataset, “fixing the problem starts with knowing how severe it is and where the music industry needs to focus its energy and resources most.”

Try zooming into the different charts below, what do you notice about the difference in Monthly Listeners that those who have been tagged as "He/Him", versus "She/Her" or "They/Them" have? Are there any pronouns missing from the dataset? Why do you think that is?
Moving forward

Data can show us a lot if we know how and where to look. The insight it provides us can be used to better understand and challenge the structures we operate under to make the music industry a more just and inclusive environment. Through exploring seemingly straightforward values like Spotify’s monthly listeners, or Spotify followers, we can find interesting trends related to complex factors like pronouns and representation. My hope at this point in the study is that my data provides a jumping off point to investigate further studies related to the digital Canadian music industry. Up next, for a different perspective we'll look at a few specific cases of Canadian artists based on their positions as either unsigned, signed to major labels, or signed to indie labels.

Next: Case Studies