And more right wingers talking nonsense about Benford’s Law update

It seems I was too kind to Larry Correia in my first post about the pro-Trumpist misleading claims about Benford’s Law. He actually is still pushing it as supposed evidence of election fraud.

“Basically, when numbers are aggregated normally, they follow a distribution curve. When numbers are fabricated, they don’t. When human beings create what they think of as “random” numbers, they’re not. This is an auditing tool for things like looking for fabricated invoices. It also applies to elections. A normal election follows the expected curve. If you look at a 3rd world dictatorship’s election numbers, it looks like a spike or a saw.

There’s a bunch of different people out there running the numbers for themselves and posting the results so you can check their math. It appears that checking various places around the country Donald Trump’s votes follow the curve. The 3rd party candidates follow the curve. Down ballot races follow the curve. Hell, even Joe Biden’s votes follow the curve for MOST of the country. But then when you look at places like Pittsburgh the graph looks like something that would have made Hugo Chavez blush.”

https://monsterhunternation.com/2020/11/09/election-2020-the-more-fuckery-update/

On Twitter I noted that far-right extremist Nick Fuentes is also pushing not just the misleading claims about Benford’s Law but a false claim that Wikipedia “added” criticism of its use in elections to discredit the claims being made about the 2020 general election. As I pointed out in this post, the rider that Benford’s Law use with electoral data was limited had been their for years. Rather than pro-Biden supporters adding it, Trump supporters removed the sentence and references in a bid to hide the fact that their analysis was flawed. You can read a 2013 version of the page here https://en.wikipedia.org/w/index.php?title=Benford%27s_law&oldid=534279795#Election_data

Since then, the section on Benford’s Law in election has expanded into a mini-essay about its use and limitations.

I don’t have a source for 2020 data at the precinct level that some of these graphs are using. I’m certain that there will be both Benford and non-Benford like distributions for Trump and Biden in various places. I do have county level data for 2020 to 2016 from here https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ

The analysis is trivial to do on a spreadsheet. Grab the first character and then tabulate it with a pivot table. You can explore various candidates from Bush to Biden on a Google sheets I made here https://docs.google.com/spreadsheets/d/1LPEKnoPtOE4VtYaM9z69B-a0XkRx5tyt_vPak8TknlY/edit?usp=sharing

Here, for example is Donald Trump in Alaska in 2016:

When you look at the district sizes in Alaska and consider Trump’s proportion of the vote, it becomes obvious very quickly that it would be absurd for this data to follow Benford’s Law. Here are the first four (of 40) districts.

DistrictTrump VotesTotal VotesPercentage
District 13180663847.91%
District 23188549258.05%
District 35403761370.97%
District 44070952142.75%
Trump’s vote in four Alaskan districts in 2016

We have leading digits of 3,5 and 4 and no 1s. Why? Because to get leading digits of 1s Trump’s votes would need to be proportionately much smaller! For example if he’d only got 20% of the vote in District 1 then that would result in some 1s. In some of the examples being passed around the Trumpist circles, that is one of the reasons for Benford-like graphs — they’ve picked places where Trump’s vote was proportionately low pushing into a ranges where 1s were common as a leading digit.

The mechanics of the deception here are fascinating. There’s an initial plausibility (Benford’s Law is a real thing and is actually used to detect fraud and has been applied to elections), a lack of any critical thinking (the examples being circulated are very limited, there’s no comparison with past elections to see what is normal) but then active deception (long standing academic critiques of applying Benford’s Law to election data being actively deleted from online wikis). On that latter part, we know the more extreme white nationalist right (Fuentes, Vox Day) are active in attempting to suppress information on how to apply Benford’s Law to election data. Providing the usual smoke screen an aura of legitimacy are the usual convenient idiots for neo-Nazis such as Larry Correia, who repeat the propaganda as ‘just asking questions’.

More far-right deception about Benford’s law

I discussed Benford’s Law and its misleading use in election data yesterday. What I didn’t mention is that the far-right vanity version of Wikipedia, known as Voxopedia aka “Infogalactic” is actively censoring information about it.

Like many articles on the out-of-date semi-vandalised wiki, the Benford’s Law article [archive version] started as a clone of the authoritative Wikipedia version in 2016. It remained unedited until 7 November, when it was hastily edited.

What was the edit? This part was removed:

“However, other experts consider Benford’s Law essentially useless as a statistical indicator of election fraud in general.Joseph Deckert, Mikhail Myagkov and Peter C. Ordeshook, (2010) ”[http://vote.caltech.edu/sites/default/files/benford_pdf_4b97cc5b5b.pdf The Irrelevance of Benford’s Law for Detecting Fraud in Elections]”, Caltech/MIT Voting Technology Project Working Paper No. 9
Charles R. Tolle, Joanne L. Budzien, and Randall A. LaViolette (2000) ”[[:doi:10.1063/1.166498|Do dynamical systems follow Benford?s Law?]]”, Chaos 10, 2, pp.331–336 (2000); {{doi|10.1063/1.166498}}”

Edit to Voxopedia by “Renegade” 12:49 7 November

Here is an image of the change. Note this is the ONLY edit that has ever occurred to the page on Voxopedia.

Over at the real Wikipedia, the same page has been subject to deceptive editing also. References to the failure of Benford’s Law to detect fraud in elections have been removed and then re-instated. Note, that prior to the US 2020 election, these references were present. The attempt to remove them occurred AFTER the far-right claims that Benford’s Law could prove fraud (e.g. from Larry Correia and Vox Day) started circulating.

The paper that extremists on the right are trying to hide from people is this one [archive pdf]. The Abstract states:

“With increasing frequency websites appear to argue that the application of Benford’s Law – a prediction as to the observed frequency of numbers in the first and second digits of official election returns — establishes fraud in this or that election. However, looking at data from Ohio, Massachusetts and Ukraine, as well as data artificially generated by a series of simulations, we argue here that Benford’s Law is essentially useless as a forensic indicator of fraud. Deviations from either the first or second digit version of that law can arise regardless of whether an election is free and fair. In fact, fraud can move data in the direction of satisfying that law and thereby occasion wholly erroneous conclusions.”

The Irrelevance of Benford’s Law for Detecting Fraud in ElectionsJoseph Deckert, Mikhail Myagkov and Peter C. OrdeshookUniversity of Oregon and California Institute of Technology

The paper discusses examples and shows (as we discussed yesterday) how election data can show both Benford-like and normal-like distribution of digits.

It can be difficult to tell the extent to which the far-right is knowingly lying versus simply not caring about the truth versus active self-deception. All three forms of subverting the truth can be in play when we look at past examples. However, we have here an unambiguous example of active lying. Day and at least one of his minions was already aware that Benford’s Law is a poor tool to use to detect fraud in elections and have been actively trying to hide that information from his followers.

I Guess I’m Talking About Benford’s Law

The US Presidential Election isn’t just a river in Egypt, it is also a series of bizarre claims. One of the many crimes against statistics being thrown about in what is likely to be a 5 year (minimum) tantrum about the election is a claim about Benford’s law. The first example I saw was last Friday on Larry Correia’s Facebook[1]

“For those of you who don’t know, basically Benford’s Law is about the frequency distribution of numbers. If numbers are random aggregates, then they’re going to be distributed one way. If numbers are fabricated by people, then they’re not. This is one way that auditors look at data to check to see if it has been manipulated. There’s odds for how often single digit, two digit, three digit combos occur, and so forth, with added complexity at each level. It appears the most common final TWO digits for Milwaukee’s wards is 00. 😃 Milwaukee… home of the Fidel Castro level voter turn out. The odds of double zero happening naturally that often are absurdly small. Like I don’t even remember the formula to calculate that, college was a long time ago, but holy shit, your odds are better that you’ll be eaten by a shark ON LAND. If this pans out, that is downright amazing. I told you it didn’t just feel like fraud, but audacious fraud. The problem is blue machine politics usually only screws over one state, but right now half the country is feeling like they got fucked over, so all eyes are on places like Milwaukee.I will be eagerly awaiting developments on this. I love fraud stuff. EDIT: and developments… Nothing particularly interesting. Updated data changes some of the calcs, so it goes from 14 at 0 to 13 at 70. So curious but not damning. Oh well.”

So after hyping up an idea he only vaguely understood (Benford’s law isn’t about TRAILING digits for f-ck sake and SOME number has to be the most common) Larry walked the claim back when it became clear that there was not very much there. As Larry would say beware of Dunning-Krugerands.

The same claim was popping up elsewhere on the internet and there was an excellent Twitter thread debunking the claims here:

footnote [2]

But we can have hierarchies of bad-faith poorly understood arguments. Larry Correia didn’t have the integrity to at least double check the validity of what he was posting before he posted it but at least he checked afterwards…sort of. Vox Day, however, has now also leaped upon the magic of Benford’s law [3]

Sean J Taylor’s Twitter thread does a good job of debunking this but as it has now come up from both Sad and Rabid Puppies, I thought I’d talk about it a bit as well with some examples.

First of all Benford’s law isn’t much of a law. Lots of data won’t follow it and the reason why some data follows it is not well understood. That doesn’t mean it has no utility in spotting fraud, it just means that to use it you first need to demonstrate that it applies to the kind of data you are looking at. If Benford’s Law doesn’t usually apply to the data you are looking at but your data does follow Benford’s law then THAT would/might be a sign of something going on.

That’s nothing unusual in statistics. Data follows distributions and comparing data against an applicable distribution that you expect to apply is how a lot of statistics is done. Benford’s law may or may not be applicable. As always, IT DEPENDS…

For example, if I grab the first digit of the number of Page Views on Wikipedia of Hugo Award finalists [4] then I get a set of data that is Benford like:

The most common digit is 1 as Benford’s law predicts. The probability of it being 1 according to the law is log10(1+1/d) or about 30%. Of the 1241 entries, Benford’s law would predict 374 would have a leading digit of 1 and the actual data has 316. But you can also see that it’s not a perfect fit and we could (but won’t bother because we actually don’t care) run tests to see how good a fit it was.

But what if I picked a different set of numbers from the same data set? Here is the leading digit for the “Age at Hugo” figure graphed for the finalists where I have that data.

It isn’t remotely Benford like and that’s normal (ha ha) because age isn’t going to work that way. Instead the leading digit will cluster around the average age of Hugo finalists. If the data did follow Benford’s law it would imply that teenagers were vastly more likely to win Hugo Awards (or people over 100 I suppose or both).

Generally you need a wide spread of numbers across magnitudes. For example, I joked about Hugo winners in their teens or their centuries but if we also had Hugo finalists who where 0.1… years old as well (and all ages in between) then maybe the data might get a bit more Benfordish.

So what about election data. ¯\_(ツ)_/¯

The twitter thread above cites a paper entitled Benford’s Law and the Detection of Election Fraud [5] but I haven’t read it. The abstract says:

“Looking at simulations designed to model both fair and fraudulent contests as well as data drawn from elections we know, on the basis of other investigations, were either permeated by fraud or unlikely to have experienced any measurable malfeasance, we find that conformity with and deviations from Benford’s Law follow no pattern. It is not simply that the Law occasionally judges a fraudulent election fair or a fair election fraudulent. Its “success rate” either way is essentially equivalent to a toss of a coin, thereby rendering it problematical at best as a forensic tool and wholly misleading at worst.”

Put another way, some election data MIGHT follow Benford’s law sometimes. That makes sense because it will partly depend on the scale of data we are looking at. For example, imagine we had a voting areas of approx 800 likely voters and two viable candidates, would we expect “1” to be a typical leading digit in vote counts? Not at all! “3” and “4” would be more typical. Add more candidates and more people and things might get more Benford like.

Harvard University has easily downloadable US presidential data by State from 1976 to 2016 [6]. At this scale and with all candidates (including numerous 3rd, 4th party candidates) you do get something quite Benford like but with maybe more 1s than expected.

Now look specifically at Donald Trump in 2016 and compare that with the proportions predicted by Benford’s law:

Oh noes! Trump 2016 as too many 1s! Except…the same caveat applies. We have no idea if Benford’s law applies to this kind of data! For those curious, Hilary Clinton’s data looks like (by eyeball only) a better fit.

Now we could test these to see how good a fit they are but…why bother? We still don’t know whether we expect the data to be a close fit or not. If you are looking at those graphs and thinking “yeah but maybe it’s close enough…” then you also need to factor in scale. I don’t have data for individual polling booths or whatever but we can look at the impact of scale by looking at minor candidates. Here’s one Vox Day would like, Pat Buchanan.

My eyeballs are more than sufficient to say that those two distributions don’t match. By Day’s misapplied standards, that means Pat Buchanan is a fraud…which he is, but probably not in this way.

Nor is it just scale that matters. Selection bias and our old friend cherry picking are also invited to the party. Because the relationship between the data and Benford’s law is inconsistent and not understood, we can find examples that fit somewhat (Trump, Clinton) and examples that really don’t (Buchanan) but also examples that are moderately wonky.

Here’s another old fraudster but whose dubious nature is not demonstrated by this graph:

That’s too many twos Ronnie!

Anyway, that is far too many words and too many graphs to say that for US Presidential election data Benford’s law applies only just enough to be horribly misleading.


[1] https://www.facebook.com/larry.correia/posts/4864622073548683

[2] Sean S Taylor’s R code https://gist.github.com/seanjtaylor/cd85175055e66cdc2bb7899a3bcdf313

[3] http://voxday.blogspot.com/2020/11/the-attack-on-benfords-law.html

[4] https://docs.google.com/spreadsheets/d/1lL9bm3I7yrkKxSAZwN1NhWr6OB8-s10IkV1g_MSSGXY/edit?usp=sharing

[5] Deckert, J., Myagkov, M., & Ordeshook, P. (2011). Benford’s Law and the Detection of Election Fraud. Political Analysis,19(3), 245-268. doi:10.1093/pan/mpr014 https://www.cambridge.org/core/journals/political-analysis/article/benfords-law-and-the-detection-of-election-fraud/3B1D64E822371C461AF3C61CE91AAF6D

[6] https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/42MVDX/MFU99O&version=5.0

Hugo Author Page Views

I gathered the Wikipedia pages of all the authors in my great big Hugo spreadsheet and used my page view gathering tool to add a page view figure to every author with an English Wikipedia page on that sheet. Most of the authors on this list of Hugo Finalists for Novel, Novella, Novelette and Short Story have a Wikipedia page but all the caveats about this data apply. A good example of the issues is Frank Herbert, whose page views have increased because of interest around the new film version of Dune. That doesn’t make the page views utterly flawed as a figure, we just need to be clear that they are a measure of current levels of attention and that currency can change dramatically for individuals.

The other more numerical issue is the distribution. Authors that are currently getting a lot of Wiki-attention do so at a scale orders of magnitude greater than those that aren’t. That can make graphing the data tricky and it also does bad things to measures of central tendency aka averages.

This time I want to look at trends over time. I’m plotting the Hugo Award year against an aggregated value of the authors who were finalists in story categories. To cope with the spread of values I’m using a logarithmic scale for the vertical axis.

Hugo story finalist graphed by year and Wikipedia 30 day page views gathered 14/09/2020

The median is less impacted by the smallest and largest values in each year. Also, in this case I’m treating authors without Wikipedia pages as missing data rather than zero. The most famous authors don’t really influence the graph unless they were finalists with a whole bunch of really famous people. I think 1964 (currently) is the peak year because of a combo of Heinlein, Anderson, Vonnegut, Norton, and Rice-Burroughs. The outliers that year are Frank Herbert (because of the Dune movie) and Clifford D. Simak (a decent number of page views just low for that year), plus Rick Raphael who gets treated as missing data because he doesn’t have an English Wikipedia page.

Arguably, there is a visible late 1990’s/early 2000 dip that has been anecdotally claimed in discussion about the Hugo Awards. Whether that is an actual feature of those finalists or whether they just fall in that spot between too long ago to be notable now but not far back enough to be revisited as classics remains an open question.

Intentionally, the graph ignores two important groups: the authors who are really, really notable currently (in terms of Wikipedia page views) and the authors who aren’t. I’ll deal with the first group by looking at the maximum values per year.

Hugo story finalist graphed by year and max values 30 day page views

I think that is very much a nothing-to-see-here sort of graph. Note that I’ve changed the maximum and minimum points on the vertical axis to fit the data in. Generally, the really high values are consistently high.

Hugo story finalist graphed by year and min values 30 day page views

The minimum value starts very noisy and then gets more stable. Remember that those authors without Wikipedia pages are counted as missing rather than zero, so don’t impact the values on this graph. I think the most recent years would look a bit noisier if we counted the missing authors as zero instead because the most recent years naturally have more early career writers who haven’t got Wikipedia pages yet.

Lastly, here is the first graph again of the median value but this time only showing the value for the winners.

Hugo story winners graphed by year and median values 30 day page views

That looks like it’s trending down a bit but note that this value will be more influenced by the shorter fiction finalists.

Page Views and the Dragon Award

There is a common impression that there has been a change in character of the Dragon Awards this year. I though I might use the Wikipedia page view metric (see here) to see if I could quantify it it in a different way.

An immediate obstacle with using the page view figure is that the distribution is very Zipf like. That makes averages very misleading because the odd Steven King or Margaret Atwood creates a big change in the mean score. To overcome that issue and also to show the authors who don’t have Wikipedia pages, I’ve grouped the data in bins that get proportionately bigger. The first bin is 0 to 10 (basically people who don’t have a Wikipedia page) then 10 to 50, then 50 to 100, then 100 to 500 etc. up to 100,000 or more which is basically Steven King.

One major caveat. The page view numbers are as they are in September 2020 in all cases. So figures for past years reflect those counts for the authors now and not as they were in the year of the award.

This is the table for book categories (I haven’t gather the data for people in the comic book categories).

Group20162017201820192020Total
< 104262453444227
≥ 101113
≥ 502215
≥ 1005488631
≥ 5002136
≥ 1,00012109141560
≥ 5,0003144214
≥ 10,0006943527
≥ 50,0002114
> 100,00011
Winners and Finalists (book categories)

Obviously, there are many ways you can group this data but I think it shows some sensible groupings.

Group20162017201820192020Total
< 10111238
≥ 5011
≥ 100112
≥ 50022
≥ 1,0003322212
≥ 5,00013116
≥ 10,0004217
≥ 50,000112
> 100,00011
Winners (book categories)

These tables don’t suggest any substantial changes to the Dragon Awards. There are ups and downs but the overall character seems to be similar: a mix of big names (e.g. in 2016, Terry Pratchett and Brandon Sanderson) down to names that are famous within their Amazon niches (e.g. Nick Cole).

However, if we look at just the ‘headline’ categories defined by the broad genres Science Fiction, Fantasy, and Horror (I thought I should include Horror) we see a different story.

Group20162017201820192020Total
< 1071212233
≥ 10112
≥ 501214
≥ 10022318
≥ 50022
≥ 100056261029
≥ 500011327
≥ 100002332515
≥ 50000112
> 10000011
Winners and Finalists in Science Fiction, Fantasy and Horror

In these three categories, the authors are (by the page view metric) more notable in 2020 than in previous years.

What about gender? The Dragon Awards have been very male dominated both in absolute terms and even more so in comparison to contemporary awards. Using the page metric groups, a shift becomes more clear.

Group20162017201820192020Total
< 103543217
≥ 100
≥ 5011
≥ 1002133211
≥ 50022
≥ 1,00023361024
≥ 5,00021227
≥ 10,00032117
≥ 5,000011
> 100,0000
Authors using she/her pronouns book categories

The substantial increase is with women authors in the 1000 to 5000 range. The difference in gender balance becomes clearer in aggregate across the years.

GroupHe/himShe/HerTotal% he% she
< 1077179482%18%
≥ 1033100%0%
≥ 5041580%20%
≥1 0020113165%35%
≥ 50042667%33%
≥ 1,00036246060%40%
≥ 5,000771450%50%
≥ 10,0002072774%26%
≥ 50,00031475%25%
> 100,00011100%0%
Total1757024571%29%
Gender split 2016-2020 book categories

The gender balance increases with grouping size until the 5,000 group and then declines. Interestingly, with three each, the 50-50 split in that group also exists for winners.

So, yes the Dragons are changing but only in places. Down ballot, finalists still tend to be less notable and more male in a way that’s not very different from 2016.

…I should add

A note on my previous two posts because it illustrates a broader point.

The page views metric does appear to be both meaningful and accessible. Those are handy qualities for making comparisons but it has a significant downside. As soon as people start paying attention to it in any significant sense then the value of it would be severely undermined.

For example, to set up the fields for the web scraping, I visited a few authors main page several times and literally added to their total. The impact of that would be small for N.K. Jemisin’s page but not insignificant for Brian Niemeier’s. The set up I created could also be easily re-designed to visit a single Wikipedia page many times while I got on with some other task.

I noticed an additional circularity today. I was curious about why there was a Chuck Tingle spike in January 2017 and so…visited his Wikipedia page. If there was any stakes attached to this kind of ranking then a random blip would generate interest in a topic which would drive interest in the Wikipedia page, which would increase the size of the blip etc etc.

I’m not suggesting anything like that is going to happen with Wiki page view stats but the scenario reminded me of more notable statistics we encounter. The most obvious one is share prices and other speculative financial data. The capacity for this kind of data to engender feedback loops is infamous and actively undermines the information value of the data.

More broadly, metrics used to judge job performance or business performance can also be self undermining in other ways. What might have been a handy piece of data will get distorted when stakes are attached to the data which in turn are intended to influence people’s behaviour. With social policy this can have unfortunate consequences e.g. in crime statistics https://www.bbc.com/news/uk-25002927

Who “won” the Puppy attention wars?

A good point people raised about yesterdays post on Wikipedia page view metrics is that it captures a current state but in many cases we are more interested in a historical value. This is particularly true when we are looking at the impact of awards or events.

Luckily I don’t need to advance my web scrapping tools further to answer this as Wikipedia actually has a tool for looking at and graphing this kind of data. Like most people I’ve used Wikipedia for many years now but I only learned about this yesterday while looking for extra data (or maybe I learned earlier and forgot — seems likely). The site is https://pageviews.toolforge.org and each of the page information pages has a link to it at the bottom under ‘external tools’.

It’s not really suitable for a data set of hundreds of pages but it is quite nice for comparing a small number of pages.

Just to see how it works and to play with settings until I got a visually interesting graph, I decided to see if I could see the impact of the Hugo Awards on four relevant pages. Now the data it will graph only goes back to 2015, so this takes the impact of SP3 as a starting point. I’ve chosen to look at John Scalzi, N.K. Jemisin, Chuck Tingle, Vox Day and Larry Correia.

https://pageviews.toolforge.org/?project=en.wikipedia.org&platform=all-access&agent=all-agents&redirects=0&start=2015-07&end=2020-08&pages=Vox_Day|John_Scalzi|Larry_Correia|Chuck_Tingle|N._K._Jemisin

I added a background colour and labels. The data shows monthly totals and because of the size of some spikes, it is plotted on a logarithmic scale. Be mindful that the points are vertically further apart in terms of actual magnitude than is shown visually.

I think the impact of N.K. Jemisin’s second and third Best Novel wins is undeniable. There is a smaller spike for the first win but each subsequent win leads to more interest. I don’t know why Chuck Tingle had a big spike in interest in January 2017.

I’ve added a little red arrow around July 2019. That was when there was a big flurry among some Baen authors that Wikipedia was deleting their articles https://camestrosfelapton.wordpress.com/2019/07/29/just-a-tiny-bit-more-on-wikipedia/

Anyway, to answer my own question: talent beat tantrums in the battle for attention

Authors: which ones get looked up?

A perennial question around award nominees is just how significant are the authors being honoured. It’s a tricky question, particularly as there is no good data about book sales. Amazon ranks are mysterious and Goodreads data may be a reflection of particular community.

I’m currently taking a few baby steps into web scraping data and I was playing with Wikipedia. Every Wikipedia article has a corresponding information page with some basic metadata about the article. For example here is the info page for the article on the writer Zen Cho https://en.wikipedia.org/w/index.php?title=Zen_Cho&action=info On that page is a field called “Page views in the past 30 days” that gives the figure stated. As a first attempt at automating some data collection, it’s a relatively easy piece of data to get.

So, I put together a list of authors from my Hugo Award and Dragon Award lists, going back a few years (I think to 2013). Not all of them have Wikipedia pages, partly because they are early in their careers but also because Wikipedia does a poor job of representing authors who aren’t traditionally published. Putting the ‘not Wiki notable’ authors aside, that left me with 163 names. With a flash of an algorithm I had a spreadsheet of authors ranked by the current popularity of their Wikipedia page.

Obviously this is very changeable data. A new story, a tragedy, a scandal or a recent success might change the number of page views significantly from month to month. However, I think it’s fairly useful data nonetheless.

So what does the top 10 look like?

1Stephen King216,776
2Margaret Atwood75,427
3Brandon Sanderson72,265
4Terry Pratchett55,591
5Rick Riordan43,484
6N. K. Jemisin34,756
7Cixin Liu32,372
8Sarah J. Maas21,852
9Ian McEwan20,468
10Neal Stephenson20,058

The rest of the top 30 look like this:

11Robert Jordan19,169
12Ted Chiang17,635
13Owen King16,041
14Jim Butcher15,493
15James S. A. Corey15,109
16Stephen Chbosky14,490
17Leigh Bardugo13,787
18China Miéville13,580
19Andy Weir13,057
20Harry Turtledove11,452
21Cory Doctorow11,362
22Jeff VanderMeer11,243
23John Scalzi10,796
24Chuck Tingle10,763
25Ben Aaronovitch10,493
26Brent Weeks10,271
27Ken Liu9,003
28Tamsyn Muir9,002
29Alastair Reynolds8,951
30Kim Stanley Robinson8,879

There’s a big Zipf-like distribution going on with those numbers that decline quickly by rank. John Scalzi has Chuck Tingle levels of fame on this metric.

OK, so I know people want to know where some of our favourite antagonists are, so here are some of the notable names from the Debarkle years.

40Vox Day5,271
45Larry Correia4,455
60John Ringo2,878
81John C. Wright1,251
111Brad R. Torgersen560
123Sarah A. Hoyt407
140L. Jagi Lamplighter229
152Dave Freer102
153Lou Antonelli101
156Brian Niemeier81

Day probably gets a lot more views due to people looking him up because of his obnoxious politics. Larry Correia is in a respectable spot in the 40’s. He is just below Martha Wells who has 4,576 page views — which is essentially the same number given how these figures might change from day to day. John Ringo is just above Chuck Wendig and Rebecca Roanhorse (2,806 and 2,786). John C Wright is sandwiched between Tade Thompson and Sarah Gailey.

You can see the full list here https://docs.google.com/spreadsheets/d/14uQsQNxKyPQtxybu4OxsFrdRRl_v-tdW0fN0oblgFw4/edit?usp=sharing

Let me know if you find any errors.

The last one for the time being

This is fan categories but with the intermediate nodes of category or year. In other words the edges of the graph join names together directly but represent that the two names shared a category in a year. It makes a big bow tie.

It is less than accurate though because of an alphabetical bias. While all finalist appear, because of a column limit when I was processing the data, the lists only went ten-deep. That means finalists further down the alphabet in years with lots of named finalists, don’t get as many connections as they should have.

A more readable version here in PDF.

The file name was short for fan categories with direct connections. However, “Fan Cats Direct” sounds like an interesting business.

They made me do this

I blame James Davis Nicoll who forced me to do this.

Hugo Fan Categories (Artist, Fanzine, Fan Writer and Fancast) by year.

The growth in years size post 2012 is firstly by the addition of fancast but also more group finalists (and often in Fancast). It looks like some huge city on a bay with bridges out to islands (the Mike Glyer Bridge joining the historic CBD to the 2016/18).

Here they are organised by award category.