Before we jump into the data, I want to give you the opportunity to get an idea of the methods I followed, the literature I drew inspiration from, and the technology I used when going through this project. Understanding the limitations of any dataset, as well as understanding how the data was sorted and gathered is essential when reading through projects that analyse social processes like the music industry. I want any biases or judgements to be clear from the beginning, so as not to misrepresent the forthcoming data. Hopefully through this section you can find yourself thinking of new ideas or methodologies to apply to any digital humanities endeavour you partake in.
Much of the inspiration for tackling this project came from D’Ignazio and Klein’s Data Feminism as well as a key inspiration from David Arditi’s Getting Signed. Getting Signed: Record Contracts, Musicians, and Power in Society, is a work that blends ethnographic fieldwork and analysis of structures in the music industry. Arditi covers the exploitative nature of record deals and how these contracts can harm the way art is made and those who produce it. From Getting Signed, I draw upon Arditi’s concept of lifting a metaphorical “veil” between producer (musician, songwriter, artist, etc.), and consumer (us, as readers and researchers actively consuming the art being produced). He describes the need to tackle these bigger questions by saying, “the problem is alienation”, and then goes on to describe the varying degrees of alienation between musicians and listeners/consumers, between musicians and the music they create, and even alienation between musicians and other musicians.
Through using data to explore representation and engagement in the music industry, we can find ourselves participating in this veil-lifting. We want to observe the trends for ourselves and use this hands-on approach of playing with and zooming into data to develop our own ideas and questions for a system that would otherwise be hidden from us. So, as a consumer of art produced from workers in the music industry, I planned the execution of this project with a focus on transparency and engagement within various levels of the industry. Hence, I have a section on the history of music distribution, a section consisting of a general analysis of over 11,000 artists, and finally a section focusing on a drilled-down view of several artists’ careers.
When it comes to data collection, data analysis, and ways in which I’ve been considering the impact of my work beyond the academic sphere, I’ve turned to Data Feminism, written by Catherine D’Ignazio and Lauren F. Klein. The book merges intersectional feminist thought with data science, addressing important concepts like the matrix of domination and standpoint theory. Specifically, unlearning a solely empirical approach to data analysis has been key in moving my project away from a strictly solutions-oriented conclusion, and more towards an information piece.
As I explored the DataFem chapters over the course of 8 months, the concept of framing became a prevalent theme in my work, used to centre the experience of the subject behind the data, ultimately conveying valuable emotion and a better understanding of the limitations behind the data. As a computer science student with a keen interest in the humanities, this concept of bringing forward emotion and subject experience is extremely important to me. When dealing with a complex issue that affects the livelihoods of so many (such as the abusive structures that form the music industry) I am trying to consistently recognize the impacts and limitations of my study. This ties in the importance of standpoint theory, which D’Ignazio and Klein discuss as an important feminist strategy to disclose your position and your subject’s position to be clear about any limits in knowledge about the experience or claims of those affected.
When I was starting to collect my data, I found that the Handbook for Folklore and Ethnomusicology Fieldwork by Lisa Gilman and John Fenn detailed a very helpful approach for guiding my data analysis and write-up portion of this project. Termed “a grounded theory approach”, Gilman and Fenn emphasise the need for an ongoing, methodical revision of data when working with communities that you are not directly a part of. This close review of data, as well as maintaining documentation of any emerging ideas that come from said data/the topic of study, needs to be coupled with an ongoing engagement with scholarly literature, so as to ensure that you are representing those your topic affects to the best of your abilities as a researcher. I found this approach really resonated with the work I was doing since the setup for this project consisted of many back-and-forth discussions with my supervisor, Dr. Jada Watson, as well as revisions of my proposal, initial data corpus, and the expanded corpus (which became this final iteration of the project). The exploratory and experimental nature of my research required consistent re-engagement with my findings and literature to refine my direction and keep me on track.
My corpus consists of just over 11,700 Canadian artists as identified by Chartmetric. Chartmetric pulls data from a variety of sources such as Bandsintown, Youtube, Instagram, Facebook, MusicBrainz, and of course, Spotify. Thanks to Chartmetric’s excellent data aggregation capabilities, my collection process consisted of simply choosing which metrics off their platform I wanted to work with in order to compile the story you’ll be seeing on this site. There are over 37,000 Canadian artists on Chartmetric, however due to time and memory constraints, I had to reduce this to a sample size roughly to a third of that (down to 11,700). I also wasn’t able to select parameters as to how this sample size would be distributed within the entire 37,000 artist set in Chartmetric, and ideally I would have been able to select the distribution by choosing every third artist to get the most complete picture of my data. Despite this, I still find this sample size and distribution adequate for the research done here.
The collection and cleaning process took place through downloading CSVs of large amounts of data from Chartmetric, then importing them into OpenRefine for cleaning and refinement. With some Python code, I then went through each column from the CSVs and decided on the variables I wanted to play around with, coming to the conclusion that genres, pronouns, solo/group tags, Spotify monthly listeners, Spotify followers, and Spotify fan conversion rates were all great identifiers for the kind of research I was doing. This collection process was all done by the end of February 2022, so the numbers represented in my charts are representative only until the most recent date displayed on the x-axis. Once I’d mapped out the Canadian artist overview portion (which you’ll find on the next page of this site), I moved onto collecting playlisting information from Chartmetric for my case studies. This was the same process of downloading CSVs directly from Chartmetric, cleaning in OpenRefine, and graphing everything out with Python to verify my results. However, I collected this playlisting data mid-April of 2022, so the data is more recent.
Another point of note is that in my data, the tagging done for pronouns, genres, group/solo acts are all given to me by Chartmetric. So, if any points after the graphing was done were found to be incorrect (ex: an artist being identified as a group instead of a solo project) this would be an error that is beyond the scope of this project. Had I had more time and resources, I would've liked to validate some of the tags myself, to ensure I was presenting the most accurate data. An example of this is how I started building my own web scraper using Python to search Google for each band/artists’ hometown. I started by gathering the top five links from a Google search using keywords associated with the band (ex: “Fanclubwallet band canada”) and then parsing the HTML of each webpage that was returned. I then compared the strings in this HTML to a list of cities in Canada and grabbed the matches for each string (taking the most common match if there were multiple). The purpose of this was to show the geographical distribution of my data through my own data aggregation, however, due to time constraints I was unable to execute this by the deadline.
Additions to data collection and visualisation like the web scraper I was building are examples of where a digital humanities project like this one could go. I view this website as a jumping off platform to show myself and others what can be done when we merge the humanities with computing.
The tech stack for this project was simple yet effective. The list below discusses what each tool was used for and why I chose to use it.
Chartmetric: the music information database that I gathered all of my data from. Chartmetric has several dashboards that display information based on aggregate biographical information, playlisting data, streaming statistics, and more. This massive amount of information, coupled with their ability to let users export filtered data to CSVs made them an obvious choice for the basis of this project.
OpenRefine: A fantastic tool that I explored in my second year digital humanities course at the University of Ottawa, OpenRefine is a data transformation app that allowed me to clean my data while making it code-ready for use in my project. I used OpenRefine to remove null spaces, clean up strings in my data, and to group together similar terms, producing data that gave me clean and simple graphs.
Python & Jupyter Notebook: Python is the powerful programming language I chose to use for the graphing and data analysis portion of this project. I wrote and executed my code in several Jupyter Notebooks (development environments for Python), which allowed me to test my code solve any bugs that I encountered. I used several Python plotting libraries, including Matplotlib and Pandas, to produce scatter plots, and when it was time for me to make the interactive versions of these visuals, I used Plotly.
DataPane: In order to display my graphs so that readers could interact with the data, I had to host my visuals online somewhere, outside of my Jupyter Notebooks - this is where DataPane came in. DataPane allows users to upload and host Python scripts so they can embed their interactive visuals in blogs or websites. Originally, I was using Plotly’s Chart Studio to export my graphs to the internet, however due to their 500KB upload limit and my very large files, I opted for Chart Studio.
Figma: For the planning process of my website, I created sketches on paper then moved them to Figma so I could create high fidelity wireframes. These wireframes were incredibly useful to me as they gave me a direction when planning the content structure of the site. I refined my written pieces that had already been approved by my supervisor, into pages of a website; translating academic writing to key points with headers and subheaders appropriate for a website. Early on in my Figma designs, I also developed a colour scheme, the main aesthetic components, and the navigation of my site, so I could keep a consistent theme throughout each portion of my Webflow site.
Adobe Illustrator: To create the complementary visuals alongside my text, I used Adobe’s Illustrator to make abstract PNGs related to the content. This was an extra touch that I found helped bring the project’s finished result to the next level.
Webflow: Finally, when it came time to put everything I’d written, graphed, and designed together, I chose to use a “no-code” website builder that I was already very comfortable with, called Webflow. Webflow allows users to create and host websites without having to go through the hassle of coding the components themselves. As a digital humanities student, I value longevity of my projects for archival purposes, however I opted to use Webflow despite its reliance on an external hosting provider because of time constraints and my familiarity with the platform.