The researcher whose work is at the center of the Facebook-Cambridge Analytica data analysis and political advertising uproar has revealed that his method worked much like the one Netflix uses to recommend movies.
In an email to me, Cambridge University scholar Aleksandr Kogan explained how his statistical model processed Facebook data for Cambridge Analytica. The accuracy he claims suggests it works about as well as established voter-targeting methods based on demographics like race, age and gender.
If confirmed, Kogan’s account would mean the digital modeling Cambridge Analytica used was hardly the virtual crystal ball a few have claimed. Yet the numbers Kogan provides also show what is – and isn’t – actually possible by combining personal data with machine learning for political ends.
Regarding one key public concern, though, Kogan’s numbers suggest that information on users’ personalities or “psychographics” was just a modest part of how the model targeted citizens. It was not a personality model strictly speaking, but rather one that boiled down demographics, social influences, personality and everything else into a big correlated lump. This soak-up-all-the-correlation-and-call-it-personality approach seems to have created a valuable campaign tool, even if the product being sold wasn’t quite as it was billed.
The promise of personality targeting
In the wake of the revelations that Trump campaign consultants Cambridge Analytica used data from 50 million Facebook users to target digital political advertising during the 2016 U.S. presidential election, Facebook has lost billions in stock market value, governments on both sides of the Atlantic have opened investigations, and a nascent social movement is calling on users to #DeleteFacebook.
But a key question has remained unanswered: Was Cambridge Analytica really able to effectively target campaign messages to citizens based on their personality characteristics – or even their “inner demons,” as a company whistleblower alleged?
If anyone would know what Cambridge Analytica did with its massive trove of Facebook data, it would be Aleksandr Kogan and Joseph Chancellor. It was their startup Global Science Research that collected profile information from 270,000 Facebook users and tens of millions of their friends using a personality test app called “thisisyourdigitallife.”
Part of my own research focuses on understanding machine learning methods, and my forthcoming book discusses how digital firms use recommendation models to build audiences. I had a hunch about how Kogan and Chancellor’s model worked.
His response requires some unpacking, and some background.
From the Netflix Prize to “psychometrics”
Back in 2006, when it was still a DVD-by-mail company, Netflix offered a reward of $1 million to anyone who developed a better way to make predictions about users’ movie rankings than the company already had. A surprise top competitor was an independent software developer using the pseudonym Simon Funk, whose basic approach was ultimately incorporated into all the top teams’ entries. Funk adapted a technique called “singular value decomposition,” condensing users’ ratings of movies into a series of factors or components – essentially a set of inferred categories, ranked by importance. As Funk explained in a blog post,
“So, for instance, a category might represent action movies, with movies with a lot of action at the top, and slow movies at the bottom, and correspondingly users who like action movies at the top, and those who prefer slow movies at the bottom.”