We don't know who won the US popular vote, decent chance it was Clinton
The common statistic reported after the US election was that Clinton "won the popular vote" by around 3 million votes over Trump. This has caused great rancour over the role of the electoral college and has provided a sort of safety valve against the shock Democrats (and others) faced over the Trump victory.
I'm here with concerning analysis, which I offer because it is a mistake on the part of the US left to underestimate the magnitude of Trump's victory, or to imagine it was only because of a flaw in the system which he gamed better than Clinton.
The problem is that the US does not officially have a thing called "the popular vote." That exists nowhere in its rules. There is no popular election of the President. Rather, there 54 elections with popular votes in 51 jurisdictions, which newspaper reporters then sum up into a number they incorrectly describe as "the national popular vote." Of course, Clinton did win that invalid sum by around 3M votes. But bad statistical practice by the press, though it has created a common convention -- for many decades -- of calling that number "the popular vote," does not make it valid. True popular votes involve all voters being free and equal, and we criticise any foreign election that pretends to call itself a popular vote when the voters are not free and equal. A popular vote, by its proper definition, is the vote total in a single election. Not 54 of them. As such, the sum is no more a popular vote total than adding the results of the 2008 and 2012 votes would get you a popular vote for or against Obama.
It's especially invalid because it's really summing two fairly different types of results.
- True Popular vote totals from "swing" states where both candidates actively campaigned, turnout was higher, and voters expected their votes to count
- Low-accuracy popular vote totals from "safe states" which candidates did not contest, and where voters knew their vote would not change the result
Statisticians will tell you these are two very different animals. We probably wish we knew who would have won the popular vote, if there had been a real national popular vote. Because there was no such vote, the hard answer is we don't know what its result would be. In particular, with a statistically invalid sum like the published national popular vote, it is incorrect to say one party "won" or "lost." There is no actual contest to win or lose, and while you can pretend that a higher total is winning, it is not a mathematically valid conclusion.
We do know that in the 16 contested regions, Trump surpassed Clinton in a simple sum by about 500,000 votes. (As you would expect, since he needed to win the swing states to win the college.) In the uncontested states, where the Presidential choice was closer to a self-selected survey than a vote, a sum of those popular votes has her about 3.4M more than Trump. While you can't add popular votes, each popular vote is a statistic, and you can combine statistics if you follow correct statistical procedures.
There are many factors which will introduce error into the results from non-contested states, making it harder to figure out what the actual popular vote might have been.
- Voters knew their votes didn't matter. Many stayed home; these states had generally lower voter turnout. The states with the lowest turnout (HI, WV, TN, TX, OK, AR, AZ, NM, MS, NY, CA, IN, UT) were generally safe states with large margins. Average turnout in 16 contested states was 65%, in non-contested states 57%.
- To get specific, a rough calculation suggests 8 to 9 million more votes would be cast in the non-contested states if they had a 65% turnout. This is a giant disenfranchisement.
- The two candidates had the lowest approval ratings ever. Many Clinton voters were not supporting her, but were out to stop Trump. Trump's ratings were even lower, so many of his voters were only out to stop Clinton. I suggest that in states where you know your vote will not elect or stop anybody, there is less motivation for nose-holding votes.
- As noted, campaigns were not active in these states. In some states, like California, Clinton did campaign, though presumably to raise money rather than votes. Having only one candidate campaign skews things more.
- More safe state voters felt comfortable voting for 3rd party choices, which they would have been less likely to do in a swing state. Many of the 4.6M votes for 3rd party candidates in safe states may have gone to major party candidates, though in what direction is unknown.
- In some safe states, even the downballot races are predetermined, discouraging voters. In California, the election of Democrats in most down-ballot races was assured; the primary was the real contest. (However, contentious ballot propositions can counter this in some states.)
In the end, though, results from a race that everybody agreed didn't matter are just a different animal from results in a contested race. You can't add apples and oranges, or perhaps more correctly, oranges and lemons. Different, though not entirely. You can add them and get a total number of citrus (votes of any kind,) but you can't call it the count of oranges (real votes.)
In spite of the frequent description of the US vote-total as a popular vote, this is at odds with common usage. The thousands of other elections in the USA are actual popular votes, as are the vast majority of elections in free countries. The US national vote sum, and similar sums published in some parliamentary elections, are the rare exception where an official and incorrect tally gets called a popular vote.
Due to much controversy about this view, I wrote up a more detailed explanation of the difference.
The 1916 election
A century ago in 1916, women could not vote for President in most of the USA -- except for Illinois, which recognized women's right to vote in Presidential elections in 1913. President Wilson did not support suffrage in 1916 but his opponent, Hughes, did, and suffragettes campaigned for Hughes as a result.
Wilson won, but Hughes won Illinois handily, in fact his margin there of 202,000 votes was his highest in any state (and 2nd highest in the land) -- in part because the addition of women to the rolls meant Illinois had more voters than any other state. I have to speculate that this margin had to do with women voting for the candidate ready to defend their basic human rights.
Wilson won the college 277 to 254. And he won the so-called popular vote by 600,000 votes. But that "popular vote" in this case consisted of adding the popular vote from states like Illinois where women were human, and other states where they were less than human. Who can defend adding those totals together, cast under such different rules, and calling it "the popular vote" and declaring that Wilson "won" the popular vote in 1916.
Today, the difference between California and other states is not so dramatic as disenfranchising an entire sex. But because Californians are told their vote for President doesn't matter, the turnout there was 56% and an average of 65% in the swing states. If California had that average, that's 2.3 million more voters. Millions disenfranchised not because of their sex, but because the system says their vote doesn't matter. California's "popular vote" is a sham, and not too different a sham from that of men-only New York in 1916 or "Dear Leader of course" North Korea today. Oh sure, they have something they call the popular vote in North Korea, but the result is known in advance and nobody thinks their vote counts. (And yes, they know they could be punished if they put their ballot in the wrong box.)
You could not add the votes of Illinois and New York in 1916 and call it a true popular vote. You can't add the results of California's sham popular vote to Florida's real popular vote and call it a true popular vote. I mean, people do that, but they should not.
Can we figure it out?
All this said, you could attempt to measure what the vote would have been. We may not have enough data, but we could make some estimates. We know that Clinton led Trump by 3.5% in national polls before the election, but we also know that Trump outperformed those polls by 1.5-6% in many contested states. To really do this would require much more careful analysis than you see in this paragraph, which is written only to show one extreme of what's possible, and the difference is almost surely less than this from these two states. Full analysis would require looking at detailed voting and polling patterns and an understanding of what motivates people to stay home or vote differently in safe states. vs. swing states, and an understanding of how Trump outperformed his polls so broadly in the contested states. In the other direction, since the 8-9 million missing voters in the safe states are in states that swing Democratic, there are arguments Clinton's total could have been even higher. However, even with that analysis we still would not really know.
My intuition is that such a result would show Clinton scoring higher than Trump, but not by 3M votes. And the margin of error would include results where Trump wins that popular vote, but this would be the outside condition. Certainly the only hard data on states that were actually contested has him win if extrapolated, but the Democratic party dominance in the big uncontested states is very strong. Also not factored in this is the effect of voter suppression techniques.
I should note to non-regular readers that I am anti-Trump. At the same time, having been shocked several times by underestimating his support, I write this because this underestimation must stop, and both sides need to come to much better understanding of how people voted for or against them, and why.
A slightly better approach would be to publish vote totals divided between swing and safe states. Because situations differ so much in the safe states, this is still not super accurate, but it's a lot better. (I built this from an earlier download so numbers may not match final totals exactly.)
Clinton Trump Johnson Stein McMillin Others Swing Total 25,946,624 26,423,193 1,783,571 434,433 203,500 351,415 Safe Total 40,582,344 37,227,033 2,770,706 1,031,304 435,055 468,484
It is interesting to note how much better Stein did in Safe states, 130% better. Johnson did 50% better, Clinton 55% more and Trump 38% more
So what should the popular vote be?
One might argue that in an ideal democracy, the popular vote would represent the aggregate view of all voters. Some nations make voting mandatory in order to get this. Australia gets 95% turnout using this technique, but Malta, New Zealand and several other countries get turnout around 90% without legal compulsion.
It might even be argued that a truly ideal democracy would not only have everybody vote, but have everybody study the choices to make an informed vote. We don't get any of these ideals, and so in the USA it has come to be accepted that the popular vote is the vote totals from those who took the time to show up. The low turnout enables both voter suppression efforts and gives extreme value to successful "get out the vote" efforts, since it is far cheaper to convince a weak supporter to show up than to convince an undecided voter to swing your way.
Some election theorists have actually proposed that the best way to do elections would be to use a random sample, sometimes combined with strong incentives for members of this sample to vote, and possibly to also learn before voting. This seems strange to non-mathematicians but actually has strong validity. (In one variant, the selected electors are known weeks in advance and the campaigns and public interest groups focus their attention on "educating" them, in which case the number must be large so that truly personal targeting is not effective.) In a nation with 90% turnout these techniques make elections much cheaper but don't affect results much. In a country with 60% turnout which switches to 99% turnout from the randomly selected electors, the result becomes a much more accurate measure of voter will than the current system.
It is also worth noting that the entire popular vote system for President is not in the US constitution, and so alternate systems, including sampling, actually are legally possible if states willed it, though politically unlikely. There are many advantages to sampling: Close to 100% turnout, more informed voters, the possible reduction of massive campaign spending and fundraising and the elimination of voter suppression. Its main disadvantage is that it doesn't match non-mathematician's instincts about how an election should work, and the added risk of corruption of the random selection.
In order to get a real popular vote, even one where we total the will of the 60% who show up, it is necessary to get rid of the college. The college could be nullified by a pact between California, Texas and two other large size republican safe states. If just those 4 states agreed to cast all their electors according to a popular vote result, it would be sufficient to make the college match that popular vote. Once it was known that this was the case, all voters would now know their vote counted, and all candidates would campaign in all states instead of just swing states, and we would have a true popular vote result.