We need better "Needle" visualizations for election night results
The 2020 election showed that doing live reporting on election counts when the order of counting has a bias is misleading and emotionally draining. Of course, there is no actual "race" with totals moving back and forth, one candidate in the lead and then another. That's just an illusion created by the bias in the order. The result is actually a fixed fact after the polls close, and we're just uncovering different parts as time goes by.
It's a bit like scratching off a lottery card. The final result of the card is fixed, you just get some drama revealing bits of it at a time.
The press get good ratings and we can't stop them from reporting this, even when they know the reports are misleading. But instead of one number, there's really two numbers to show from partial results:
- The best estimate of the final number based on your models and what data you have
- The amount of uncertainty in that estimate (which you might show as a scalar, or as a distribution.)
As we know, Trump even worked to exploit that lie, known as the red mirage, to call the election into question. It's serious stuff.
This particular time, we the following large biases.
- In person votes were usually counted before mail-ins. Trump deprecated mail-ins so they leaned mostly to Biden.
- Large urban counties had many more ballots to count and reported later. Urban voters tend more to Democrats.
- Maps also present information in a biased way, for the same urban/rural bias.
There might be new biases in the future.
How to report the election
Results should be displayed not with simple bar or pie charts as they currently are, or maps. Instead a new graphical form must be created that shows the estimated final result as a distribution graph. This could mean two curves overlaid on a chart, one red, one blue. At the start the curves would be fairly flat and overlap quite a bit. Over time they would narrow into sharper regions and overlap very little, and then not at all. When all votes are counted, they would be just a single line at the final vote count, but by the time they stop overlapping, the election is mathematically certain, and when the overlap is very small, the race can be "called" as very likely for the leader.
Indeed, the second most important figure, after whose estimate is ahead or behind, will be the amount of overlap. This should be shown very clearly, or possibly expressed at how strong the colour of the graph is.
We can see one attempt at this in the prediction charts used to show polling results mapped to the more chaotic system of the electoral college. Charts for races will not look this messy.
Many will have seen the New York Times "Needle" which is one of the most widely known efforts to express this information. We need more of this. Though there are some who don't like it.
It's easy to create two-dimensional visualizations, but they don't fit on TV screens in the crawl area. What's needed is a good "strip" visualization that perhaps uses colour and shading and a little bit of the 2nd dimension (thickness, etc.) to make a more true presentation.
However, when they focus on a particular contest, they should switch to a full two-dimensional -- or even three-dimensional -- view to paint a more full picture, and remind viewers of what is happening. Once again, the race is decided, we are just scratching off numbers on the lottery car to see who won.
The course of the night
The chart would begin before the election with the polls. We know the polls suck so it will be important that the distribution curves start very fat with lots of overlap.
As votes arrive, the network must have a model for predicting both the final result, and the uncertainty. And it must show both to the viewer. This model will know a lot of things, and know it for each precinct, county and state.
- How has that precinct voted in past elections, and if possible, how did the type of vote affect it?
- What votes have come in so far, and which were from in-person, mail-in, military or other sources?
- Do any exit polls or surveys provide information on trends in this area, and patterns among types of ballots
- What trends on county, state, national and other demographic layers have come in from other places and elections to predict any general shifts from past elections?
- How stable is the precinct over time?
The most important data are #2, the votes coming in, which reveal the trends in that precinct for that election, compared to our old data. The more of this we get, the better our prediction gets.
For a viewer watching an election where the models were near perfect, you would see two fat curves simply get narrower and stay in the same place all night until they became lines. For elections with more uncertainty, the predicted central peak of the curves would move up and down. Generally the breadth of the curves would narrow all evening, though it's possible new information could reveal that an election has more uncertainty that usual, and the model would correct for that.
On screen, viewers would watch the dance of the curves and pay most attention to the number describing how much overlap they have. They would cheer and cry when the overlap dropped too low. Minor party candidates would appear on the chart at the start, but as soon as the top of their range went below the bottom of the leader's range, they would be eliminated and could be removed from the chart to avoid clutter. (Their numbers should still be present in tabular form of course, or single curve to make it clear you are not seeing the whole result.)
In 2020, with a good model, people would have not seen Donald Trump 10 points ahead in Pennsylvania, and Biden slowing climbing up to "pass" him. Instead, it would have started with two very fat curves, the blue one slightly ahead of the red, but with tons of overlap.
Very soon though, once decent populations of in-person and mail-in tallies arrived for each area, the curves would have firmed up. They would have converged on a very close race, with the peaks in about the same place, the blue one wider than the red. Then they would have started narrowing until the result became more and more clear.
I'm not the only person who feels these visualizations are important. Informed International has a summary article and there are others in the literature.
Just one curve?
In a 2-party race, or once all but 2 are eliminated, there is really only one number since the red fraction is one minus the blue fraction. So one could just display a single number, the predicted difference, with the probability distribution for that. This may not be as clear to the viewer but perhaps something can be devised. In addition, this sort of display would require that moves for one candidate over another would be up vs. down or left vs. right, implying a bias. This could possibly be expressed with colour so it moves from zero (very close) up to high, switching from red to blue depending on which is ahead.
Just two quantities
It could be that you mainly want to show 2 quantities, plus a bit. The first quantity is the projected final difference, which of course is from -100% to +100%, though normally a much smaller range. The other is the uncertainty. If these were normal distributions that could be the standard deviation but these are probably not normal distributions.
To simplify, it is better to use only a positive number. One visualization could be a rectangle. The length of the rectangle is the difference, the height some expression of the variance. The sign could be whether the box is red or blue.
For example, and the start we might see a blue box which is 10% wide and quite tall, indicating the blue candidate is favoured but uncertainty is high. Importantly, the height is more than the width, meaning there is plenty of chance for it to flip. Over time, the length would move a bit (if your model is good, not too much) but the height would collapse to a line when all votes are counted.
This approach, though is too two-dimensional for a sidebar or bottom third of a TV screen.
How to do this?
Some election site or network has to build this graphical display. And they need to get information from as many counties and precincts they can on what type of ballots are being reported, as well as votes for candidates. They won't get it from all of them, but if they get it from some of them, they can extrapolate -- and widen their bars in consequence.
Viewers would have to understand that networks showing the overlapping curves were giving them the truth, and those showing percentages or bar charts were just giving them bad information. They would need to understand this enough to tune out from that method. The models would have to be good, both at estimating peaks and in making the probabilities clear.
There is some temptation to simplify. Instead of complex curves, the curves could be smoothed, even turned into overlapping boxes. That takes away information but may be easier to grasp. Experimentation can tell what works. We could even run simulations based on 2020 data which was presumably recorded. One issue is that 2D charts take more room on the screen and networks want something they can put in a lower-third bar. It could be reduced to two numbers, the margin between the two predicted peaks, and a current uncertainty number, for example.
It would require educating the audience a little about what a probability distribution is or how to understand the graph. I think it can be pretty intuitive with a bit of work. It's worth the education �
Add new comment