Hugo 2023 Stats: The One-Sixtieth Solution

I do like learning things and there is a goldmine in this comment from last week from Peter Wilkinson and also in the EPH post from David Wallace.

Some arithmetic first. With EPH each voter gets their one vote divided between the nominees they list. If you only list 1 then thats 1/1=1 points per nominee, 2 then 1/2=0.5 points, 3 then 1/3=0.3333etc, 4 then 1/4=0.25 and 5 then 1/5=0.2. In decimal the 1/3 points is a bit of a mess because as a decimal fraction you have all those 3’s.

Recurring fractions can be an issue in binary as well and so even sophisticated computers can have rounding errors. There are also basic process errors we can make when doing calculations. Round an answer to early and you can end up with sizeable errors in your data. Also, there are different conventions on how to round decimals for different purposes. A normal one is to round up 5’s e.g. 0.125 to 2 decimal places is normally shown as 0.13 but that can add a tiny bias upwards.

In short, if you can, it is better to avoid fractions altogether and work with integers and converting everything at the end. With EPH there is a neat trick. All the fractional parts have to sums of 1/2,1/3,1/4 or 1/5. The common denominator is 60, so instead of giving everybody 1 point to be divided by nominees, you can give them 60 points and then all you have is biggish whole numbers. You are happy, the computer is happy, the general public…well “give everybody 60 points each” is just another weird rule to explain. However, we can just divide by 60 RIGHT AT THE END and it is all good. Any rounding won’t effect the result and is just a convenience for showing the numbers.

EPH stats have been conventionally shown to 2 decimal places. The are exactly 99 two-digit decimal numbers from 0.01 to 0.99. However, there are only 59 two-digit decimal numbers that are roundings of a fraction out of 60 from 0.02 (i.e. 0.0166666 etc) to 0.98 (i.e. 0.98333333 etc).

With me so far?

This gives us a way of looking at published stats. It doesn’t tell us what has occurred but it will show when the EPH stats have been calculated differently then normal. It might be just at the final stage of displaying the stats e.g. maybe somebody just truncates the numbers and shows 1/60 as 0.01 instead of 0.02. However, it could be something else.

I was given a file of digitised EPH results from 2020 to 2023. For every cell in each row of data I calculated a spreadsheet formula of the form =cell – INT(cell). What this does is just leave behind the two-digit fractional part. Then I can get the spreadsheet to classify every non-blank or zero cell as either Y (it is on the list of sixtieth fractions) or N (it isn’t). Finally, I can coun’t up the number of N’s in a row to highlight nominees with issues.

Again, we can’t know the source of the issue by itself. and it can be 100% cosmetic and had zero impact on the outcome.

First, here all all the rows with issues from 2022 and 2021

–

[That’s an empty table]

OK here is 2023:

Category	Finalist	CountN
Novel	Legends & Lattes	1
Novel	Nona the Ninth	4
Novel	The Daughter of Doctor Moreau	2
Novella	A Prayer for the Crown-Shy *	1
Novelette	The Space-Time Painter	9
Novelette	Color the World	5
Novelette	A Dream of Electric Mothers	3
Novelette	Two Hands, Wrapped in Gold **	3
Novelette	We Built This City	3
Short Story	On the Razor’s Edge	4
Short Story	Rabbit Test	1
Short Story	D.I.Y.	1
Short Story	The White Cliff	1
Short Story	Zhurong on Mars	1
Short Story	Fongong Temple Pagoda*	7
Short Story	Resurrection	5
Short Story	Lonely Room	1
Short Story	Memories in Snow	2
Short Story	2039: Era of Brain Computer	1
Graphic Story	Chivalry《骑士精神》	4
Related Work	Chinese Science Fiction, An Oral History《中国科幻口述史》	1
Related Work	Blood, Sweat & Chrome: The Wild and True Story of Mad Max: Fury Road《血汗与铬：疯狂的麦克斯狂暴真实故事：狂暴之路》	6
Related Work	History of Chinese Science Fiction in the 20th Century《20 世纪中国科幻小说史》*	2
Related Work	Buffalito World Outreach Project《小水牛出海计划》	4
DPLF	Everything Everywhere All at Once《瞬息全宇宙》	6
DPSF	Andor: Rix Road《安多：立斯路》	3
EditorLF	Lee Harris 李·哈里斯	9
EditorLF	Lindsey Hall 林赛·霍尔	5
EditorLF	Yan Huan 颜欢	5
EditorLF	Ruoxi Chen 陈若熹	1
EditorLF	Sarah Peed 莎拉·皮德	1
EditorLF	Yao Haijun 姚海军	1
EditorSF	Sheree Renee Thomas 雪莉·蕾妮·托马斯	1
EditorSF	Wang Xu 汪旭	1
EditorSF	Yang Feng 杨枫	1
EditorSF	Latssep 拉兹	1
Fan Writer	Paul Weimer *	3

It looks like a lot but there are actually more that have 0 issues, but remember 60% of 2 decimal place numbers fit the Y criteria regardless. As Peter Wilkinson pointed out Editor Long Form as a lot of anomalies. In Novel, for once we have a weirdness that doesn’t impact Babel.

Paul is listed but other high-profile “ineligibles” aren’t although “Fonggong Temple Pagoda” is. “Space Time Painter” won Novelette and faced a backlash among Chinese fans and has the most number of anomalouse values.

However, 2023 isn’t the only year with some anomalies by this approach. 2020 has a few quirks also.

Category	Finalist	CountN
Novel	A Memory Called Empire Arkady Martine	4
Novel	Gideon the Ninth Tamsyn Muir	4
Novel	The Raven Tower Ann Leckie	3
Novella	This Is How You Lose the Time War Amal El-Mohtar & Max Gladstone	4
Novella	To Be Taught, If Fortunate Becky Chambers	3
DPLF	Captain Marvel	2
DPLF	Good Omens	2

Typos, rounding errors, bad coding or something worse?

Feb 5, 2024

camestrosfelapton

Hugo2023

blog, decimal, math, Mathematics, technology

12 responses to “Hugo 2023 Stats: The One-Sixtieth Solution”

John S / ErsatzCulture says:

Feb 5, 2024 at 4:58 am

Great stuff!

I’m reminded a bit of when I worked on a marketing data tool for a well known internet company, and when you applied multiple filters like gender/age bracket/country, some rows started showing percentages that were all 0/33/66/100 or 0/25/50/75/100, giving you a pretty good idea of how few people were surveyed in that particular combination of attributes.

LikeLiked by 3 people

Reply
Mike Glyer says:

Feb 5, 2024 at 5:04 am

Nice working! I would give this post 60 likes if I could.

LikeLiked by 1 person

Reply
pjevans88 says:

Feb 5, 2024 at 5:59 am

The 60-point solution is nice!

LikeLiked by 3 people

Reply
rlewiston777 says:

Feb 5, 2024 at 5:59 am

Are you trying to figure out if 2023 is more anomalous than other years? If so, what are your final conclusions?

LikeLike

Reply
- camestrosfelapton says:
  
  Feb 5, 2024 at 7:05 am
  
  At this point I am more counting all the ways it is anomalous (ie lots of ways!). I will try to pull everything together at some point including the work others have done
  
  LikeLiked by 1 person
  
  Reply
Laura says:

Feb 5, 2024 at 11:40 am

I ❤ EPH.

LikeLike

Reply
Greg Hullender says:

Feb 5, 2024 at 2:14 pm

It increasingly looks like they used new EPH software and simply didn’t adequately test it first.

And, more and more, I’m thinking none of this had anything to do with Chinese censorship.

LikeLiked by 1 person

Reply
- camestrosfelapton says:
  
  Feb 5, 2024 at 4:49 pm
  
  Yeah, unified stuff up theory with maybe a bit of commercial greed thrown in
  
  LikeLike
  
  Reply
Hyman Rosen says:

Feb 5, 2024 at 5:23 pm

When I worked at Bloomberg I made a video about floating-point conversion between binary and decimal in computer arithmetic. It’s something most people, programmers included, don’t understand, and they can do a lot of flailing around, rounding here, adding fudge factors there, and generally making a mess of things.

As you say, EPH calculations should be handled entirely by integer arithmetic. While the description of EPH may be written in terms of fractions of points, the implementation of the code should not be doing that (unless they use an exact rational number arithmetic package, which is overkill for this job).

LikeLike

Reply
Jan Vaněk jr. says:

Feb 6, 2024 at 2:51 am

A great and fresh idea indeed, thanks.

But can’t you just find out how exactly the software used by the previous Worldcons/administrations works? The original developer should be still relatively easy to find and reach; and I expect they were aware of the issue.

Also, ISTR that it is again a simple Shannonian–Dirichletian corollary that when dealing with 3-figure numbers of nomination, doing the internal representation at 3 decimal places should be enough.

Finally, the phrase “it is better to avoid fractions altogether and work with integers” was a little bit confusing: what you mean is to avoid lossy rounded decimal/percentage representation and work WITH fractions (which are after all pairs of integers, or single ones after bringing them to the common denominator of 60).

LikeLike

Reply
The 2023 Hugo nomination statistics have finally been released – and we have questions | Cora Buhlert says:

Feb 6, 2024 at 7:15 am

[…] 02-05-2024: Camestros Felapton delves even deeper into EPH and the many issues with the Chengdu Hugo nomination …. Cam also comes up with a way of preemptively checking whether a given Worldcon is actually able to […]

LikeLike

Reply
Peter Wilkinson says:

Feb 6, 2024 at 8:40 am

It’s probably worth remarking:

1) Even allowing for the fact that the perceptible error rate on something purporting to be an EPH table rounded to two decimal places is only 40% when numbers are entered randomly, not even the worst tables come close to 40%. There are 99 figures in each table (except for Fancast, which contains no errors, at least of this type) and the highest numbers of errors are 24 in Short Story, and 23 each in Novelette and Editor LF – the next highest seems to be 13 in Related, with none of the other categories reaching double figures (and 8 out of 19 categories don’t show any errors of this type at all). This doesn’t completely remove the possibility of buggy software as the cause, but it does mean that we could do with some kind of idea of how the software would have been reducing the problem without eliminating it completely.

2) Where Cam’s 2023 table shows errors in several columns for one of the contestants in one of the EPH tables, a look at the relevant table often shows that at least some of these errors are likely to be effectively the same error repeating from one column to the next. For instance, in Novelette, The Space-Time Painter has errors in every column – but it is only shown as gaining point transfers in two columns. In every other column, with no point transfer having occurred, the erroneous figure for the contestant is identical to the erroneous one in the column immediately to the left. And the two columns which do have point transfers (and so different figures)? The first of these definitely does show a not easily explicable change from the previous column, and can reasonably be taken to have replaced one error with a very different one. The second of them, though, is a change of exactly 0.5, which is exactly what one expects when, for instance, the only other contestant for one particular point has just been eliminated – and The Space-Time Painter acquires the remaining half of the point from the eliminated contestant. So that also looks like an only slightly changed repetition of the same error as in the column to the left.

3) In a couple of the categories with a high number of these errors, the final column seems to be rather a magnet for them. In both Short Story and Editor LF, 5 of the 7 remaining contestants have these errors in the final column. In Editor LF, this also to some extent links up to my previous paragraph here – Lee Harris propagates what seems to be effectively the same error back through all nine columns, and Lindsey Hall and Yan Huan also do so for four or five columns. As I have remarked previously, Editor LF seems to have some other problems too – they might or might not be related.

LikeLike

Reply