Last week I read a very interesting Twitter thread by Jack Edwards in which he made a subject data request to Spotify. Amongst all the data about your listening habits there’s a file called inferences.json, which contains a trove of inferences about you as a consumer.
Inspired by the thread I decided to make a data subject request of my own for a second time. This time I was not looking at my music listening patterns (I do listen to a lot of The Cure and Depeche Mode, who knew?), but I went directly to the small inferences.json file (here it is, and here is a copy in Google Docs). What I found was a very interesting trove of data that hints at a vast network of data gathering that goes well beyond Spotify, and fuels the surveillance economy.
Right off the bat I found some laughably wrong inferences about me. I’m not a Mercedes owner, I’m not a VW owner, I’m not a Ford owner, I don’t own a car for that matter, I don’t plan to buy one anytime soon. I’m not an Engineer, I’m not a student, I’m not a Craftsman, nor a DIY enthusiast (but I have bought some DIY stuff), I don’t work for the government, I’m not a “Halloween Enthusiast”, I never used online dating, even when I was single, and I don’t like cricket or golf, just to name a few. There are some contradictory categories, apparently I’m a home owner, home mover, and home renter. I have children and do not have children, and I’m both “Engaged/Getting Married”, and “Getting Divorced”.
But what is interesting is what they get right… and this is where we should be paying real attention.
Learning how to read the file was not that difficult, these are some inferences made on you as a consumer based on four sources of information. First there are 1P, 2P and 3P, this refers as the origin of the data, 1P means firs-hand information obtained by Spotify directly, 2p and 3p stands for second party and 3rd party. I do not know what’s the difference between a second party and a third party, but seeing as there’s only one line marked as 2p, I did not care to look at it any further. There is a fourth source of data which is unmarked. More on that at the end.
Ignoring the outright wrong, analysing the accurate data is fascinating. Spotify’s own inferences, covered in the 1p heading, are mostly accurate:
I have my Spotify connected to my TV, and I do not own any smart speakers. I own a Google Pixel, and I guess I’m a passionate curator and also a social gamer. Interestingly, I’m not a Roku customer but I once linked my Spotify account to a Roku box at an AirBnB to listen to music, so it’s interesting that Spotify still remembers this one-off event.
The second party line is inaccurate as far as I could find, it implies that I was a Sky customer in January 2019, which is not the case as far as I can see.
Things get really interesting analysing the third party data, as it appears like there is very specific data sharing from one service to another, sometimes with dates. Interestingly, I keep an excellent record of all of my e-commerce purchases, so in many instances I was able to track down the source of the data.
Ignoring all of the inaccurate inferences, there’s a lot of stuff that they get right, but seeing all of the misses, one has to wonder as to the accuracy of the data. I like wine and beer (true), and I’m even classed as a beer lover (but do not use in 2021). We’re a “Couple without Kids” (true), I used to own a BMW (but I’m not sure if I’m in the “Thrillseeking Car Enthusiasts” category). I am an Amazon Prime subscriber, and also subscribe to other video subscription services. I’m a Foodie, and also a Festival Goer. I’m a gamer, an RPG gamer, and a game console owner, but I’m decidedly not a First Person Shooter (FPS) gamer. Ugh.
The really interesting stuff is that which is marked with a data, one has to assume that these were tied to very specific purchases. For example, I found these two puzzling dates from Capital One, the US credit card company:
“3P_Custom_Capital One_HHI $50 – $200k_10Sep2020_US”,
As far as I could find, HHI stands for calculated income, but the dates were strange, I had not made any purchases, and I don’t use Capital One. However, checking closely, I was setting up streaming services in new devices on those two dates, it’s clear that streaming services share some sort of information with credit checking agencies.
Other dates stood out. Several specific advertising dates coincided with Ocado orders (Ocado is a UK-based home delivery online supermarket). In fact, a lot of accurate brand and product inferences evidently come from Ocado. I’m a dessert lover and a yoghurt consumer because I bought two pots of yoghurt in an Ocado order on November 7 2019.
Other sources of information are Amazon and Google Pay. This series of entries proved to be quite indicative of the level of cross-app data sharing:
“3P_Custom__ Entertainment – Interest – Video Games_19-Feb-2021_WW”,
“3P_Custom__ Entertainment – Mobile & App – Interest – Video Games_19-Feb-2021_WW”,
“3P_Custom__ Entertainment – Mobile & App – Sport – Outdoor_19-Feb-2021_WW”,
“3P_Custom__ Entertainment – Mobile & App – Sport – Running_19-Feb-2021_WW”,
“3P_Custom__ Entertainment – Mobile & App – Video Game – Portable Console Owner_19-Feb-2021_WW”,
“3P_Custom__ Entertainment – Sport – Running_19-Feb-2021_WW”,
“3P_Custom__ Entertainment – Video Game – Portable Console Owner_19-Feb-2021_WW”,
“3P_Custom__Cooking & Recipes_19-Feb-2021_UK”,
19 February 2021 features prominently. My guess is that some of these dates may refresh from time to time, but some are definitely tied to purchases. On February 19 I bought 2 items from Amazon, a fridge odour remover (we found some mould), and a pair of Sony headphones. The headphones in particular appear to have been a data hit, tagging me as a runner, trendsetter, holiday baker, and outdoor person. I also made a Pokemon Go in-game purchase, which also appears to have been a huge data hit.
Most other dates correspond to some sort of purchase from Ocado, Amazon, Uber Eats, or Google Pay. Some of these purchases point towards extremely narrow categories (yes, I have bought a Swifter and Oral B toothpaste). However, some dates appear to be wrong, or do not match the tag. I couldn’t find if I was indeed engaging in “Intense Workout” either on January 1st or July 31 2020.
Another fascinating source of information consists of the various personality inferences. I was shocked by the many types of personality and economic categories in the file, some of which are tied with a brand. There are some straightforward income and class assumptions in the file. Income is classed at anywhere between £50k GBP and $200k USD. We’re an “Affluent Household”. The class is set at ABC1; this is also a range, A is upper middle class, while C1 is lower middle class, so I’m pretty much as middle class as they come.
The personality names however were fascinating! Some are interesting albeit inaccurate, such as thrillseeking and trendsetter, I have not set any trends since the 90s. Some gems:
- Indulgent imitators
- Financially Savvy Credit Users
- Digital Dynamos
- Top End Techtastics
- Culture seekers
- Social savvies
- Style Watchers / Fashionista
Some personality traits fall under specific marketing categories. Apparently I’m a maximizer, not a satisficer.
Finally, there are a bunch of test categories that appear to be there to show ads to see if the customer bites. So I’ve been targeted by Heinz (but not for mayonnaise). Most of these tests appear to be quite specific, and do not have a source from first parties or third parties. This category includes services such as NerdWallet (never heard of them), Sierra Nevada brewery, and another attempt at getting me to play First Person Shooters.
Never gonna happen data miners, never gonna happen.
Nothing much to conclude, other than this is further evidence of the amount of data collusion out there between third parties. What really strikes me is just how much of the data is just outright wrong or useless. Who cares if I like yoghurt? We’ve erected such an intrusive level of data gathering to gain useless insights into shopping patterns.
I do like being a Top End Techtastic though.