Should Hugo nomination ballots and voting data be released?

On the lingering comment thread of the last File 770 Puppy round-up, there is an on-going discussion about whether anonymized ballots should be released. I’ve made some longer comments that I’m putting here. I have re-edited them for coherence and to remove places were I swapped between talking about nomination ballots and final voting ballots without being clear when I was swapping subjects. Also corrected spelling, bad grammar, stupid sentences, etc etc and added links were relevant. Much of this concerns the proposed EPH system for counting nominations which I discussed here.

Anonymity issues

In terms of anonymity the important thing would be to ensure that ballot IDs were not the same across categories. For example for final voting data, I posted how I was voting on all the works-based categories but not on categories like Artist or Editor. Now my preferences on novel, novella etc act as a kind of i.d. in the data that say “Camestros” if the ballots in each category can be linked together. If they can’t (i.e. different ballot codes in each category and assigned in a different order) then it won’t be possible to work out how I may have voted in categories which I didn’t post publicly. The same would be true with nomination data – if I had publicized what I was nominating in some categories but not in others then if complete sets of nominations were made available (i.e. how an anonymous person nominated in all categories) it would be possible to infer how I had nominated in categories I hadn’t revealed.

However, if the ballots are anonymized in a way that wouldn’t allow comparison across categories, it won’t be possible to identify slate voters and that will be a little sad for people running simulations. That is a lesser concern though and so long as you can’t match ballots across categories, I don’t think there are substantial privacy concerns.

The only other kind of case I can think of would be if I had said I voted this way:

  • Ancillary Cheese 2
  • The Cheese Emperor 4
  • The Three Cheese Pizza 1
  • The Cheese Between the Crackers 3
  • Skin Cheese 5

– and nobody in the ballot data had actually voted in that specific pattern, then I would look like a lying liar. Not terribly likely though – a lot more voters than there are specific preference patterns per category. However, nomination data it is a bit more possible to have a unique set of nominees (or to claim to have) e.g imagine I say I’ve nominated:

  • If You Were Cheese My Love
  • Throne of Cheese
  • The Cheese That Falls from Nowhere
  • One Bright Cheese to Guide Them
  • Vox Day’s Bumper Book of Cheese

as a compromise SJW-cheese-message fiction/Castalia Cheese House set to bring peace and understanding then because it would be an unusual combination of works it would be identifiable by its absence (i.e. it not being there would demonstrate I lied).

Worst case scenarios

I think released data will generate all sort of new theories, speculation and supposed evidence of wrong doing – but people don’t actually need data to do that.

What is the worst case scenario? Anonymized data is released and it reveals genuine shenanigans of some kind, perhaps some weird voting pattern that strongly implies dark-deeds done by somebody to stack the deck against the Puppies (I can’t imagine what but let’s pretend*). That would be deeply troubling and it would be embarrassing to organizers and impact on the credibility of the Hugo Awards *and* even after everything was resolved there would be a lingering taint *and* it would be rehashed over and over by certain quarters (c.f. the Climategate emails for an example on a more important issue). Even then people are better off knowing (and the sooner the better so people can do something about it). I don’t think that is going to happen but either way transparency will help in the long run by making shenanigans harder.

[*I can’t imagine in two ways – firstly quite what those dark-deeds would be and secondly how they would show up in the data. So this is a deeply implausible nightmare scenario]

General points

  • I think with the suggestions proposed the data would be sufficiently anonymous. Particularly for nominations – which per category would reveal very little information per ballot (i.e. very hard to infer from a set of 5 works nominated any identifying features or effective ‘signatures’)
  • Yes, people will go hunting for patterns that fit their agendas but they will do that anyway (c.f. Dave Freer’s supposed red-ball/black-ball analysis of past Hugo winners) – with real data it is easier to test claims empirically and debunk methodically. That won’t change the minds of partisans but then nothing will. It will help reassure others.
  • Yes, you don’t need real data to run scenarios with EPH. I’ve made my own toy version of EPH and I can play with different ballot patterns BUT having real data makes it possible to make pretend patterns more realistic in terms of how votes are distributed.
  • Changing nomination system will probably have some impact on behavior. How much and in what way matters. Additionally people may change their voting behavior based on plausible beliefs (e.g. whether to bullet vote or not etc) which might not be true. It is good if lots of people can independently check against real data and show that in general the best approach is to just nominate as they did previously.
  • While EPH isn’t that complicated an algorithm, it does leave the nomination process open to accusations that somehow the specific code used could be rigged or is accidentally faulty. Released data would allow people to demonstrate that exactly the same results occur on any correct implementation of the algorithm.

I think the last one is important. A standard denier attack on the Anthropogenic Global Warming hypothesis is that things like General Circulation Models, global mean surface temperature data sets or proxy based historical global mean temperature estimates (e.g. the ‘hockey stick’) were either incorrect, methodologically flawed or in some ways fraudulent. Releasing data and code and having people replicate and confirm results hasn’t made the deniers go away but it has helped substantially the scientific community to re-gain credibility after the massive denier propaganda attacks around Climategate. The BEST study is a notable example – partly funded by the Koch’s, lauded initially by the denier blogs and then oops! Confirmed that the temperature record is accurate and that the warming is real and substantial.