Fears confirmed on failure of fix to Hugo awards
Last year, I wrote a few posts on the attack on Science Fiction's Hugo awards, concluding in the end that only human defence can counter human attack. A large fraction of the SF community felt that one could design an algorithm to reduce the effect of collusion, which in 2015 dominated the nomination system. (It probably will dominate it again in 2016.) The system proposed, known as "e Pluribus Hugo" attempted to defeat collusion (or "slates") by giving each nomination entry less weight when a nomination ballot was doing very well and getting several of its choices onto the final ballot. More details can be found on the blog where the proposal was worked out.
The process passed the first round of approval, but does not come into effect unless it is ratified at the 2016 meeting and then it applies to the 2017 nominations. As such, the 2016 awards will be as vulnerable to the slates as before, however, there are vastly more slate nominators this year -- presuming all those who joined in last year to support the slates continue to do so.
Recently, my colleague Bruce Schneier was given the opportunity to run the new system on the nomination data from 2015. The final results of that test are not yet published, but a summary was reported today in File 770 and the results are very poor. This is, sadly, what I predicted when I did my own modelling. In my models, I considered some simple strategies a clever slate might apply, but it turns out that these strategies may have been naturally present in the 2015 nominations, and as predicted, the "EPH" system only marginally improved the results. The slates still massively dominated the final ballots, though they no longer swept all 5 slots. I consider the slates taking 3 or 4 slots, with only 1 or 2 non-slate nominees making the cut to be a failure almost as bad as the sweeps that did happen. In fact, I consider even nomination through collusion to be a failure, though there are obviously degrees of failure. As I predicted, a slate of the size seen in the final Hugo results of 2015 should be able to obtain between 3 and 4 of the 5 slots in most cases. The new test suggests they could do this even with a much smaller slate group as they had in the 2015 nominations.
Another proposal -- that there be only 4 nominations on each nominating ballot but 6 nominees on the final ballot -- improves this. If the slates can take only 3, then this means 3 non-slate nominees probably make the ballot.
An alternative - Make Room, Make Room!
First, let me say I am not a fan of algorithmic fixes to this problem. Changing the rules -- which takes 2 years -- can only "fight the last war." You can create a defence against slates, but it may not work against modifications of the slate approach, or other attacks not yet invented.
Nonetheless, it is possible to improve the algorithmic approach to attain the real goal, which is to restore the award as closely as possible to what it was when people nominated independently. To allow the voters to see the top 5 "natural" nominees, and award the best one the Hugo award, if it is worth.
The approach is as follows: When slate voting is present, automatically increase the number of nominees so that 5 non-slate candidates are also on the ballot along with the slates.
To do this, you need a formula which estimates if a winning candidate is probably present due to slate voting. The formula does not have to be simple, and it is OK if it occasionally identifies a non-slate candidate as being from a slate.
- Calculate the top 5 nominees by the traditional "approval" style ballot.
- If 2 or more pass the "slate test" which tries to measure if they appear disproportionately together on too many ballots, then increase the number of nominees until 5 entries do not meet the slate condition.
As a result, if there is a slate of 5, you may see the total pool of nominees increased to 10. If there are no slates, there would be only 5 nominees. (Ties for last place, as always, could increase the number slightly.)
Let's consider the advantages of this approach:
- While ideally it's simple, the slate test formula does not need to be understood by the typical voter or nominator. All they need to know is that the nominees listed are the top nominees.
- Likewise, there is no strategy in nominating. Your ballot is not reduced in strength if it has multiple winners. It's pure approval.
- If a candidate is falsely identified as passing the slate test -- for example a lot of Doctor Who fans all nominate the same episodes -- the worst thing that happens is we get a few extra nominees we should not have gotten. Not ideal, but pretty tame as a failure mode.
- Likewise, for those promoting slates, they can't claim their nominations are denied to them by a cabal or conspiracy.
- All the nominees who would have been nominated in the absence of slate efforts get nominated; nobody's work is displaced.
- Fans can decide for themselves how they want to consider the larger pool of nominees. Based on 2015's final results (with many "No Awards") it appears fans wish to judge some works as there unfairly and discount them. Fans who wish it would have the option of deciding for themselves which nominees are important, and acting as though those are all that was on the ballot.
- If it is effective, it gives the slates so little that many of them are likely to just give up. It will be much harder to convince large numbers of supporters to spend money to become members of conventions just so a few writers can get ignored Hugo nominations with asterisks beside them.
It has a few downsides, and a vulnerability.
- The increase in the number of nominees (only while under slate attack) will frustrate some, particularly those who feel a duty to read all works before voting.
- All the slate candidates get on the ballot, along with all the natural ones. The first is annoying, but it's hardly a downside compared to having some of the natural ones not make it. A variant could block any work that fits the slate test but scored below 5th, but that introduces a slight (and probably un-needed) bit of bias.
- You need a bigger area for nominees at the ceremony, and a bigger party, if they want to show up and be sneered at. The meaning of "Hugo Nominee" is diminished (but not as much as it's been diminished by recent events.)
- As an algorithmic approach it is still vulnerable to some attacks (one detailed below) as well as new attacks not yet thought of.
- In particular, if slates are fully coordinated and can distribute their strength, it is necessary to combine this with an EPH style algorithm or they can put 10 or more slate candidates on the ballot.
All algorithmic approaches are vulnerable to a difficult but possible attack by slates. If the slate knows its strength and knows the likely range of the top "natural" nominees, it can in theory choose a number of slots it can safety win, and name only that many choices, and divide them up among supporters. Instead of having 240 people cast ballots with the 3 choices, they can have 3 groups of 80 cast ballots for one choice only. No simple algorithm can detect that or respond to it, including this one. This is a more difficult attack than the current slates can carry off, as they are not that unified. However, if you raise the bar, they may rise to it as well.
All algorithmic approaches are also vulnerable to a less ambitious colluding group, that simply wants to get one work on the ballot by acting together. That can be done with a small group, and no algorithm can stop it. This displaces a natural candidate and wins a nomination, but probably not the award. Scientologists were accused of doing this for L. Ron Hubbard's work in the past.
The best way to work out the formula would be through study of real data with and without slates. One candidate would be to take all nominees present on more than 5% of ballots, and pairwise compare them to find out what fraction of the time the pair are found together on ballots. Then detect pairs which are together a great deal more than that. How much more would be learned from analysis of real data. Of course, the slates will know the formula, so it must be difficult to defeat it even knowing it. As noted, false positives are not a serious problem if they are uncommon. False negatives are worse, but still better than alternatives.
So what else?
At the core is the idea of providing voters with information on who the natural nominees would have been, and allowing them to use the STV voting system of the final ballot to enact their will. This was done in 2015, but simply to give No Award in many of the categories -- it was necessary to destroy the award in order to save it.
As such, I believe there is a reason why every other system (including the WSFS site selection) uses a democratic process, such as write-in, to deal with problems in nominations. Democratic approaches use human judgment, and as such they are not a response to slates, but to any attack.
As such, I believe a better system is to publish a longer list of nominees -- 10 or more -- but to publish them sorted according to how many nominations they got. This allows voters to decide what they think the "real top 5" was and to vote on that if they desire. Because a slate can't act in secret, this is robust against slates and even against the "slate of one" described above. Revealing the sort order is a slight compromise, but a far lesser one than accepting that most natural nominees are pushed off the ballot.
The advantages of this approach:
- It is not simply a defence against slates, it is a defence against any effort to corrupt the nominations, as long as it is detected and fans believe it.
- It requires no algorithms or judgment by officials. It is entirely democratic.
- It is completely fair to all comers, even the slate members.
The downsides are:
- As above, there are a lot more nominees, so the meaning of being a nominee changes
- Some fans will feel bound to read/examine more than 5 nominees, which produces extra work on their part
- The extra information (sorting order) was never revealed before, and may have subtle effects on voting strategy. So far, this appears to be pretty minor, but it's untested. With STV voting, there is about as little strategy as can be. Some voters might be very slightly more likely to rank a work that sorted low in first place, to bump its chances, but really, they should not do that unless they truly want it to win -- in which case it is always right to rank it first.
- It may need to add EPH style counting if slates get a high level of coordination.
Another surprisingly strong approach would be simply to add a rule saying, "The Hugo Administrators should increase the number of nominees in any category if their considered analysis leaves them convinced that some nominees made the final ballot through means other than the nominations of fans acting independently, adding one slot for each work judged to fail that test, but adding no more than 6 slots." This has tended to be less popular, in spite of its simplicity and flexibility - it even deals with single-candidate campaigns -- because some fans have an intense aversion to any use of human judgment by the Hugo administrators.
- Very simple (for voters at least)
- Very robust against any attempt to corrupt the nominations that the admins can detect. So robust that it makes it not worth trying to corrupt the nominations, since that often costs money.
- Does not require constant changes to the WSFS constitution to adapt to new strategies, nor give new strategies a 2 year "free shot" before the rules change.
- If administrators act incorrectly, the worst they do is just briefly increase the number of nominees in some categories.
- If there are no people trying to corrupt the system in a way admins can see, we get the original system we had before, in all its glory and flaws.
- The admins get access to data which can't be released to the public to make their evaluations, so they can be smarter about it.
- Clearly a burden for the administrators to do a good job and act fairly
- People will criticise and second guess. It may be a good idea to have a post-event release of any methodology so people learn what to do and not do.
- There is the risk of admins acting improperly. This is already present of course, but traditionally they have wanted to exercise very little judgment.