Over the weekend, Meet the Press host Chuck Todd asked Democratic pollster Cornell Belcher, “What are we missing”? Belcher provided a reply that no doubt warmed the hearts of the ever-growing number of polling skeptics. Per the transcript:
“I think what we’re missing is (I hate to say this because I’m a pollster) our polling numbers are not going to be accurate… We don’t know what the electorate’s going to be. I don’t trust any of the horserace numbers right now. And I’m a pollster.”
This conversation, not surprisingly, has reached Texas. In Texas Monthly’s Underdog podcast, the second-to-last pre-election episode focused almost entirely on the belief that polls may miss the mark. With early voting in Texas having far outpaced the entirety of all votes cast in the 2014 election, national attention remains focused on the Ted Cruz–Beto O’Rourke contest due in large part to a consistent, but comparatively narrow polling lead for Cruz. In the final days of the campaign, pundits, policymakers and partisans on both sides keep asking the same question of the highest-profile race in Texas: Can the polls be wrong?
We’ve been involved in a handful of the polls in the Texas race and elsewhere in 2018, and the answer to this is straightforward: Yes.
The most apparent potential problem is whether the electorate differs significantly from assumptions about who will turn out. Those assumptions are baked into the polls based on how their samples are constructed (who gets contacted) and/or who is counted as a “likely voter.”
The Texas election in 2018 clearly highlights this issue. Often, it’s best to look at people’s past voting behavior, as verified through the state voter file, to create a list of “likely voters” — for example, including everyone who has voted in a recent midterm election or for the first time in the 2018 primaries. But what happens when, as in this election year, early turnout looks more like a presidential election than a midterm? That likely voter list is going to miss a lot of voters.
The most common way to address this problem is to ask people if they intend to vote, and then to only include those respondents who indicate — with something like astronomical certainty — that intention. The problem with this approach is that people universally overstate their likelihood to participate in elections. This is less problematic in high-turnout elections, but it’s still a problem. In Texas, polling conducted by multiple entities during this cycle has been consistent in its predicted margins, regardless of approach, which should be cause for confidence but can be indicative of a related problem, called “pooling.”
Pooling is when pollsters, dealing with measuring uncertain outcomes, look at other pollsters’ results and nudge their own in the direction of the polling average. It happens as they go through the many choices that they make about weighting the results to match what they believe the electorate will look like. That projected electorate, of course, doesn’t actually exist until after all the votes are cast and counted. Rarely, if ever, does this nudging result in a change in the direction of the predicted outcome, but it might result in a shift of a point or so; over a number of polls, that can give the appearance of greater certainty about the outcome than is warranted. In the case of the Cruz–O’Rourke race, it’s fair to wonder if pooling is taking place around the conventional wisdom that Cruz maintains a narrow but durable advantage over O’Rourke.
So far, all of these problems reflect the ability of more or less standard polling practices to fit what is shaping up to be, in many ways, a very unusual election year in Texas. In terms of the assumptions about the current situation, the question might be boiled down to, “How unusual, and in what ways?”
But these problems also reflect the fact that many people tend to misunderstand how probability and sampling define what polls can and cannot accomplish — and with how much certainty. The laws of probability enable us to make inferences about the attitudes of large groups of people without gathering responses from everyone. Pollsters don’t need to contact every voter to estimate the preferences of the electorate, just like you don’t need to eat an entire bowl of soup to know how it tastes. A spoonful or two will do. Is it possible that one spoonful will miss some ingredients? Yes, and pollsters try to avoid this as much as possible in their pursuit of an accurate sample — that is, a small-but-thorough representation of the overall population. But even with a truly representative sample of voters, it’s still just a sample, and estimates based off samples will vary — and in close elections, this can lead to misfires.
Relatedly, observers — especially journalists and opinion leaders — can focus too much on the results of one poll or pollster instead of an aggregate of the polls that takes into account timing, sampling and likely voter models. A single poll showing an aberrant result can be overinterpreted in people’s minds as more representative of the true state of an election than is warranted. This can happen because a poll is an outlier (i.e. just an unlucky draw in the sample of survey respondents), or more likely, when opinion leaders impute a trend towards one candidate or the other in the waning days of the campaign and start talking about momentum or late-breaking voters, when in fact, it’s just one survey estimate.
There are other factors at play in the current election, but the pre-election consensus has been remarkably consistent: Cruz has led O’Rourke in every recent poll by anywhere between 2 and 9 percentage points, making it highly unlikely for O’Rourke to actually be leading on Election Day. But it’s also true that this election is one that clearly differs from the sleepy affairs that characterize Texas Elections, with their nominal Democratic challengers and nationally recognized low turnout.