People ask me how I find my way around the confusing and hateful sections of the internet. Naturally I have a map.
No, no, not Trump’s bizarre legal cases. I’m talking about a different right winger with a distorted view of reality. Yes, we are back on the Patreon versus Fans of Owen ‘Flat Earth’ Benjamin. For previous episodes of this saga please see:
So the next episode in this Quixotic tale was scheduled for December when a case management conference was due. However, there has been a further development.
Back in June Patreon filed that the case should be designated as complex litigation . That motion has now been agreed and that means a new court and a new judge and I guess all sorts of things I don’t understand.
Bad news or good news for Benajmin and Vox Day? Probably bad news. This was what Patreon asked for and Day isn’t boasting about it. The case management conference has moved to January 5.
It seems I was too kind to Larry Correia in my first post about the pro-Trumpist misleading claims about Benford’s Law. He actually is still pushing it as supposed evidence of election fraud.
“Basically, when numbers are aggregated normally, they follow a distribution curve. When numbers are fabricated, they don’t. When human beings create what they think of as “random” numbers, they’re not. This is an auditing tool for things like looking for fabricated invoices. It also applies to elections. A normal election follows the expected curve. If you look at a 3rd world dictatorship’s election numbers, it looks like a spike or a saw.https://monsterhunternation.com/2020/11/09/election-2020-the-more-fuckery-update/
There’s a bunch of different people out there running the numbers for themselves and posting the results so you can check their math. It appears that checking various places around the country Donald Trump’s votes follow the curve. The 3rd party candidates follow the curve. Down ballot races follow the curve. Hell, even Joe Biden’s votes follow the curve for MOST of the country. But then when you look at places like Pittsburgh the graph looks like something that would have made Hugo Chavez blush.”
On Twitter I noted that far-right extremist Nick Fuentes is also pushing not just the misleading claims about Benford’s Law but a false claim that Wikipedia “added” criticism of its use in elections to discredit the claims being made about the 2020 general election. As I pointed out in this post, the rider that Benford’s Law use with electoral data was limited had been their for years. Rather than pro-Biden supporters adding it, Trump supporters removed the sentence and references in a bid to hide the fact that their analysis was flawed. You can read a 2013 version of the page here https://en.wikipedia.org/w/index.php?title=Benford%27s_law&oldid=534279795#Election_data
Since then, the section on Benford’s Law in election has expanded into a mini-essay about its use and limitations.
I don’t have a source for 2020 data at the precinct level that some of these graphs are using. I’m certain that there will be both Benford and non-Benford like distributions for Trump and Biden in various places. I do have county level data for 2020 to 2016 from here https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
The analysis is trivial to do on a spreadsheet. Grab the first character and then tabulate it with a pivot table. You can explore various candidates from Bush to Biden on a Google sheets I made here https://docs.google.com/spreadsheets/d/1LPEKnoPtOE4VtYaM9z69B-a0XkRx5tyt_vPak8TknlY/edit?usp=sharing
Here, for example is Donald Trump in Alaska in 2016:
When you look at the district sizes in Alaska and consider Trump’s proportion of the vote, it becomes obvious very quickly that it would be absurd for this data to follow Benford’s Law. Here are the first four (of 40) districts.
|District||Trump Votes||Total Votes||Percentage|
We have leading digits of 3,5 and 4 and no 1s. Why? Because to get leading digits of 1s Trump’s votes would need to be proportionately much smaller! For example if he’d only got 20% of the vote in District 1 then that would result in some 1s. In some of the examples being passed around the Trumpist circles, that is one of the reasons for Benford-like graphs — they’ve picked places where Trump’s vote was proportionately low pushing into a ranges where 1s were common as a leading digit.
The mechanics of the deception here are fascinating. There’s an initial plausibility (Benford’s Law is a real thing and is actually used to detect fraud and has been applied to elections), a lack of any critical thinking (the examples being circulated are very limited, there’s no comparison with past elections to see what is normal) but then active deception (long standing academic critiques of applying Benford’s Law to election data being actively deleted from online wikis). On that latter part, we know the more extreme white nationalist right (Fuentes, Vox Day) are active in attempting to suppress information on how to apply Benford’s Law to election data. Providing the usual smoke screen an aura of legitimacy are the usual convenient idiots for neo-Nazis such as Larry Correia, who repeat the propaganda as ‘just asking questions’.
The US Presidential Election isn’t just a river in Egypt, it is also a series of bizarre claims. One of the many crimes against statistics being thrown about in what is likely to be a 5 year (minimum) tantrum about the election is a claim about Benford’s law. The first example I saw was last Friday on Larry Correia’s Facebook
“For those of you who don’t know, basically Benford’s Law is about the frequency distribution of numbers. If numbers are random aggregates, then they’re going to be distributed one way. If numbers are fabricated by people, then they’re not. This is one way that auditors look at data to check to see if it has been manipulated. There’s odds for how often single digit, two digit, three digit combos occur, and so forth, with added complexity at each level. It appears the most common final TWO digits for Milwaukee’s wards is 00. Milwaukee… home of the Fidel Castro level voter turn out. The odds of double zero happening naturally that often are absurdly small. Like I don’t even remember the formula to calculate that, college was a long time ago, but holy shit, your odds are better that you’ll be eaten by a shark ON LAND. If this pans out, that is downright amazing. I told you it didn’t just feel like fraud, but audacious fraud. The problem is blue machine politics usually only screws over one state, but right now half the country is feeling like they got fucked over, so all eyes are on places like Milwaukee.I will be eagerly awaiting developments on this. I love fraud stuff. EDIT: and developments… Nothing particularly interesting. Updated data changes some of the calcs, so it goes from 14 at 0 to 13 at 70. So curious but not damning. Oh well.”
So after hyping up an idea he only vaguely understood (Benford’s law isn’t about TRAILING digits for f-ck sake and SOME number has to be the most common) Larry walked the claim back when it became clear that there was not very much there. As Larry would say beware of Dunning-Krugerands.
The same claim was popping up elsewhere on the internet and there was an excellent Twitter thread debunking the claims here:
But we can have hierarchies of bad-faith poorly understood arguments. Larry Correia didn’t have the integrity to at least double check the validity of what he was posting before he posted it but at least he checked afterwards…sort of. Vox Day, however, has now also leaped upon the magic of Benford’s law 
Sean J Taylor’s Twitter thread does a good job of debunking this but as it has now come up from both Sad and Rabid Puppies, I thought I’d talk about it a bit as well with some examples.
First of all Benford’s law isn’t much of a law. Lots of data won’t follow it and the reason why some data follows it is not well understood. That doesn’t mean it has no utility in spotting fraud, it just means that to use it you first need to demonstrate that it applies to the kind of data you are looking at. If Benford’s Law doesn’t usually apply to the data you are looking at but your data does follow Benford’s law then THAT would/might be a sign of something going on.
That’s nothing unusual in statistics. Data follows distributions and comparing data against an applicable distribution that you expect to apply is how a lot of statistics is done. Benford’s law may or may not be applicable. As always, IT DEPENDS…
For example, if I grab the first digit of the number of Page Views on Wikipedia of Hugo Award finalists  then I get a set of data that is Benford like:
The most common digit is 1 as Benford’s law predicts. The probability of it being 1 according to the law is log10(1+1/d) or about 30%. Of the 1241 entries, Benford’s law would predict 374 would have a leading digit of 1 and the actual data has 316. But you can also see that it’s not a perfect fit and we could (but won’t bother because we actually don’t care) run tests to see how good a fit it was.
But what if I picked a different set of numbers from the same data set? Here is the leading digit for the “Age at Hugo” figure graphed for the finalists where I have that data.
It isn’t remotely Benford like and that’s normal (ha ha) because age isn’t going to work that way. Instead the leading digit will cluster around the average age of Hugo finalists. If the data did follow Benford’s law it would imply that teenagers were vastly more likely to win Hugo Awards (or people over 100 I suppose or both).
Generally you need a wide spread of numbers across magnitudes. For example, I joked about Hugo winners in their teens or their centuries but if we also had Hugo finalists who where 0.1… years old as well (and all ages in between) then maybe the data might get a bit more Benfordish.
So what about election data. ¯\_(ツ)_/¯
The twitter thread above cites a paper entitled Benford’s Law and the Detection of Election Fraud  but I haven’t read it. The abstract says:
“Looking at simulations designed to model both fair and fraudulent contests as well as data drawn from elections we know, on the basis of other investigations, were either permeated by fraud or unlikely to have experienced any measurable malfeasance, we find that conformity with and deviations from Benford’s Law follow no pattern. It is not simply that the Law occasionally judges a fraudulent election fair or a fair election fraudulent. Its “success rate” either way is essentially equivalent to a toss of a coin, thereby rendering it problematical at best as a forensic tool and wholly misleading at worst.”
Put another way, some election data MIGHT follow Benford’s law sometimes. That makes sense because it will partly depend on the scale of data we are looking at. For example, imagine we had a voting areas of approx 800 likely voters and two viable candidates, would we expect “1” to be a typical leading digit in vote counts? Not at all! “3” and “4” would be more typical. Add more candidates and more people and things might get more Benford like.
Harvard University has easily downloadable US presidential data by State from 1976 to 2016 . At this scale and with all candidates (including numerous 3rd, 4th party candidates) you do get something quite Benford like but with maybe more 1s than expected.
Now look specifically at Donald Trump in 2016 and compare that with the proportions predicted by Benford’s law:
Oh noes! Trump 2016 as too many 1s! Except…the same caveat applies. We have no idea if Benford’s law applies to this kind of data! For those curious, Hilary Clinton’s data looks like (by eyeball only) a better fit.
Now we could test these to see how good a fit they are but…why bother? We still don’t know whether we expect the data to be a close fit or not. If you are looking at those graphs and thinking “yeah but maybe it’s close enough…” then you also need to factor in scale. I don’t have data for individual polling booths or whatever but we can look at the impact of scale by looking at minor candidates. Here’s one Vox Day would like, Pat Buchanan.
My eyeballs are more than sufficient to say that those two distributions don’t match. By Day’s misapplied standards, that means Pat Buchanan is a fraud…which he is, but probably not in this way.
Nor is it just scale that matters. Selection bias and our old friend cherry picking are also invited to the party. Because the relationship between the data and Benford’s law is inconsistent and not understood, we can find examples that fit somewhat (Trump, Clinton) and examples that really don’t (Buchanan) but also examples that are moderately wonky.
Here’s another old fraudster but whose dubious nature is not demonstrated by this graph:
That’s too many twos Ronnie!
Anyway, that is far too many words and too many graphs to say that for US Presidential election data Benford’s law applies only just enough to be horribly misleading.
 Sean S Taylor’s R code https://gist.github.com/seanjtaylor/cd85175055e66cdc2bb7899a3bcdf313
 Deckert, J., Myagkov, M., & Ordeshook, P. (2011). Benford’s Law and the Detection of Election Fraud. Political Analysis,19(3), 245-268. doi:10.1093/pan/mpr014 https://www.cambridge.org/core/journals/political-analysis/article/benfords-law-and-the-detection-of-election-fraud/3B1D64E822371C461AF3C61CE91AAF6D
I was asked in email, how many active editors Vox Day’s vanity clone of Wikipedia has. The answer is about 30 people a month make some sort of edit but that includes people making a change to their own page (and those are often people who sign up with user accounts and then do nothing). I think the number of people actively editing is about half that and of those there is a core of around six people. Essentially this is not very different from how it was in 2017, about a year into its existence.
I don’t know the number of user accounts it has but I believe it is substantial. Initially, I tracked how quickly it was signing up users but I got bored. The idea of the semi-vandalised clone of Wikipedia remains popular but despite that approval, the number of people on the right willing to do the work is tiny.
Following tangents I ended up at Jeff Duntemann’s blog and onto a post about ‘Infogalactic’, Vox Day’s vanity version of Wikipedia. Duntemann’s post was about a wider idea about interconnected mutually searchable wiki’s which was interesting but a side issue caught my attention:
“Infogalactic has a lot of its own articles. However, when a user searches for something that is not already in the Infogalactic database, Infogalactic passes the search along to Wikipedia, and then displays the returned results.”https://www.contrapositivediary.com/?p=4400
I thought this must be a new development, which would be interesting in itself. Either that or I had misunderstood some of the claims Day had made about Voxopedia’s software.
However, no, Voxopedia does not do this. Take this relatively new page on Wikipedia https://en.wikipedia.org/wiki/3rd_Lithuanian_National_Cavalry_Brigade. If you search for the topic on Voxopedia it currently just returns the standard page not found response.
That raises the question though about what it was Day was boasting about previously in terms of Voxopedia staying up to date with Wikipedia. Rather than Duntemann’s neat sounding idea, Day had been claiming that his encylcopedia would have “dynamic forking” via software that he had christened “fork bot”:
“Rifleman and the Techstars are very pleased to report that the much-awaited dynamic forking tool is not only complete, not only tested, but is now operational. You can see the results here.https://web.archive.org/web/20200914191921/http://voxday.blogspot.com/2017/05/infogalactic-forkbot-is-go.html May 04, 2017
What this means is that Infogalactic will always be entirely up-to-date with Wikipedia across all five million+ pages, including the newest ones, except for those where the Infogalactic editors have improved upon specific pages.”
Day’s post contained a link to the import log and while I don’t know what it showed back in 2017 but what it shows now is editors import several Wikipedia pages into Voxopedia in a few chunks at a time. Maybe they made that manual process easier but three years on, Voxopedia still lags behind Wikipedia in terms of articles and the numbers they are importing are less than the numbers Wikipedia create. It’s a forking joke.
Back in January 2017, Day had described that they had reached the stage of “manual dynamic forking” (the jokes write themselves) which I discussed back then here https://camestrosfelapton.wordpress.com/2018/03/15/revisiting-voxopedia/ As far as I can see the whole operation has made zero technical progress since then. Notably, the article I used as an example (Australian politician Barnaby Joyce) is still unchanged since 2016 and has him still serving as the Deputy Leader of the National Party and a minister in the long-gone Malcolm Turnbull government.
The other big claim made by Day regarding Voxopedia’s capabilities was that it was going to replace the underlying MediaWiki software. Day had made a big deal about how they had used MediaWiki initially when cloning Wikipedia in the first place but that the underlying software that Wikipedia uses is bad and out of date and that his guys could do better. Remember that these weren’t just boasts but claims which he was using to raise money from supporters with.
This new wiki software christened the DONTPANIC engine was going to be so good that Day would be able to monetise it as a service for others:
“I should also mention that due to corporate demand, we are going to be putting together an Infogalactic Consult branch to help organizations make the change from the wikimedia engine to DONTPANIC for their internal wikis, or even just to make their existing wikis more functional and efficient. If you have a need for this, feel free to get in touch.”https://web.archive.org/web/20200914194300/http://voxday.blogspot.com/2017/05/mailvox-knowledge-core-vs-convergipedia.html May 08, 2017
A check of the source code of newly created articles at Voxopedia shows that it still uses MediaWiki 1.27.1. The current version that Wikipedia uses is 1.34.2. Not only does Voxopedia still use the software that Day was claiming was inadequate but it uses an out of date version of that software.
The vaunted development roadmap for Voxopedia was last updated on May 6 2017 and that point in early May appears to be around the time the whole project ground to a halt in terms of its technology. https://infogalactic.com/info/Infogalactic:Roadmap At that point the progress looked like this:
- Phase Two
- IN PROGRESS
- • DONTPANIC engine
- • Sub-sites wikimedia – COMPLETE
- • Sub-sites DONTPANIC
- • Ad server DONTPANIC
- • Dynamic page updates – COMPLETE
- • Improved Database categories
- • Relativity, Reliability, and Notability 1.0 algorithms
- Phase Three
- • Tri-level page content: Fact, Context, Opinion
- • Verified autobiography sub-pages
- • Preference filtering
- • Initial gamification and status bling operational
- • Safe Mode
- • Gab integration
- • User Interface 2.0 Beta
- • Gamification and status bling complete
Needless to say none of this was implemented but more relevantly a purported encyclopedia doesn’t even keep an entry about itself up to date. Again, Day raised money for this project on the basis of claims about what Voxopedia would be able to do.
Meanwhile, one of my favourite pages is still there https://infogalactic.com/info/Bibhorr_formula
 Google’s info boxes do something similar e.g. if you search for an author thy might give you a little bio from Wikipedia or Goodreads or Google Books.
A good point people raised about yesterdays post on Wikipedia page view metrics is that it captures a current state but in many cases we are more interested in a historical value. This is particularly true when we are looking at the impact of awards or events.
Luckily I don’t need to advance my web scrapping tools further to answer this as Wikipedia actually has a tool for looking at and graphing this kind of data. Like most people I’ve used Wikipedia for many years now but I only learned about this yesterday while looking for extra data (or maybe I learned earlier and forgot — seems likely). The site is https://pageviews.toolforge.org and each of the page information pages has a link to it at the bottom under ‘external tools’.
It’s not really suitable for a data set of hundreds of pages but it is quite nice for comparing a small number of pages.
Just to see how it works and to play with settings until I got a visually interesting graph, I decided to see if I could see the impact of the Hugo Awards on four relevant pages. Now the data it will graph only goes back to 2015, so this takes the impact of SP3 as a starting point. I’ve chosen to look at John Scalzi, N.K. Jemisin, Chuck Tingle, Vox Day and Larry Correia.
I added a background colour and labels. The data shows monthly totals and because of the size of some spikes, it is plotted on a logarithmic scale. Be mindful that the points are vertically further apart in terms of actual magnitude than is shown visually.
I think the impact of N.K. Jemisin’s second and third Best Novel wins is undeniable. There is a smaller spike for the first win but each subsequent win leads to more interest. I don’t know why Chuck Tingle had a big spike in interest in January 2017.
I’ve added a little red arrow around July 2019. That was when there was a big flurry among some Baen authors that Wikipedia was deleting their articles https://camestrosfelapton.wordpress.com/2019/07/29/just-a-tiny-bit-more-on-wikipedia/
Anyway, to answer my own question: talent beat tantrums in the battle for attention
A perennial question around award nominees is just how significant are the authors being honoured. It’s a tricky question, particularly as there is no good data about book sales. Amazon ranks are mysterious and Goodreads data may be a reflection of particular community.
I’m currently taking a few baby steps into web scraping data and I was playing with Wikipedia. Every Wikipedia article has a corresponding information page with some basic metadata about the article. For example here is the info page for the article on the writer Zen Cho https://en.wikipedia.org/w/index.php?title=Zen_Cho&action=info On that page is a field called “Page views in the past 30 days” that gives the figure stated. As a first attempt at automating some data collection, it’s a relatively easy piece of data to get.
So, I put together a list of authors from my Hugo Award and Dragon Award lists, going back a few years (I think to 2013). Not all of them have Wikipedia pages, partly because they are early in their careers but also because Wikipedia does a poor job of representing authors who aren’t traditionally published. Putting the ‘not Wiki notable’ authors aside, that left me with 163 names. With a flash of an algorithm I had a spreadsheet of authors ranked by the current popularity of their Wikipedia page.
Obviously this is very changeable data. A new story, a tragedy, a scandal or a recent success might change the number of page views significantly from month to month. However, I think it’s fairly useful data nonetheless.
So what does the top 10 look like?
|6||N. K. Jemisin||34,756|
|8||Sarah J. Maas||21,852|
The rest of the top 30 look like this:
|15||James S. A. Corey||15,109|
|30||Kim Stanley Robinson||8,879|
There’s a big Zipf-like distribution going on with those numbers that decline quickly by rank. John Scalzi has Chuck Tingle levels of fame on this metric.
OK, so I know people want to know where some of our favourite antagonists are, so here are some of the notable names from the Debarkle years.
|81||John C. Wright||1,251|
|111||Brad R. Torgersen||560|
|123||Sarah A. Hoyt||407|
|140||L. Jagi Lamplighter||229|
Day probably gets a lot more views due to people looking him up because of his obnoxious politics. Larry Correia is in a respectable spot in the 40’s. He is just below Martha Wells who has 4,576 page views — which is essentially the same number given how these figures might change from day to day. John Ringo is just above Chuck Wendig and Rebecca Roanhorse (2,806 and 2,786). John C Wright is sandwiched between Tade Thompson and Sarah Gailey.
You can see the full list here https://docs.google.com/spreadsheets/d/14uQsQNxKyPQtxybu4OxsFrdRRl_v-tdW0fN0oblgFw4/edit?usp=sharing
Let me know if you find any errors.
Comicsgate, the culture war rebellion that was so self-defeating that it managed to turn its own harassment campaign against itself still drifts on. The anti-free speech group that engaged in online harassment campaigns against multiple creators (e.g. Magdalene Visaggio, Sue DeConnick, Alyssa Wong, Noelle Stevenson and Ta-Nehisi Coates to name just a few) likes to style itself in the standard alt-right opposite-day rhetoric as being pro-free speech and opposed to “cancel culture” “mobs”. The movement engaged in verbal abuse, rape threats, death threats and doxxing as well as calls for boycotts and campaigns to get creators fired for their views. Connected with the harassment campaigns were various crowd-funding attempts by comicsgate creators such as Ethan Van Scriver to take advantage of the outrage marketing to help fund their own projects. [for examples see the references]
When Vox Day decided to attach himself to the movement [see my coverage in the references] the amount of abuse increased but much of the toxicity turn in on itself with pro-Day and pro-Ethan Van Scriver factions attacking each other. Caught in the crossfire (or fuelling the crossfire depending on who you ask) was our old pal Jon Del Arroz, who after spending a few years on his own harassment/culture-war grift, is currently complaining, as a consequence of his comicsgate experience, about right-wing culture war grifters [references].
Well that’s two paragraphs just to cover the background. What has all that got to do with John Ringo?
Currently there is a crowdunding campaign on Indiegogo to turn John Ringo’s zombie apocalypse Black Tide Rising series into a series of graphic novels [references]. The creators involved are Chuck Dixon, Derlis Santacruz, Brett R Smith, and Dave Dorman.
Veteran writer Chuck Dixon became embroiled in the alt-right comics culture war after being recruited into Vox Day’s Arkhaven Comics ‘Alt Hero’ line of comics. Day, in case anybody here has forgotten, is infamous for his support of terrorist Anders Brevik and called the mass murder of over 70 people (the youngest of whom was 14) “a highly effective blow against the political machine”. Day’s randomly vandalised version of Wikipedia also spreads conspiracy theories that casts people convicted of child abuse as victims of state conspiracies [references]. I mention all that not to say that somehow Dixon is guilty by association but to point out which things bother these ‘alternative voices’ in comics and which things very notably do not seem to bother them at all.
Of the others, Brett R Smith openly aligned himself with the comicsgate campaign eg:
However, Smith’s most notable connection was with the comic Jawbreakers, which he worked on with notable comicsgate figure Richard C. Meyer. Smith also attempted to produce a comic in support of violent far-right protestor Kyle “Based Stickman” Chapman. Chapman, a man with convictions for robbery, theft, and illegal weapon sales, became something of a hero among the alt-right when he was filmed beating protestors with a stick. Chapman later attempted to set up his own quasi-Proud Boys street-fighting spin off called ‘Fraternal Order of Alt-Knights’. Smith said of Chapman: “I concur but we have an army of our own. @BasedStickMan_ @ProudBoysUSA @Oathkeepers all kept the peace. They stood firm & we won the day.” [references]
I suspect that just listing all this stuff in one place and pointing to the connections will engender counter-criticism that doing so is ‘cancel culture’ or stirring up an SJW-mob. It isn’t. If people want to buy a John Ringo story in comic book form then that’s their business but we shouldn’t be shy about discussing the overt and publicly stated views of the creators. If people state they are engaged in a culture war then it is really odd, indeed psychologically unhealthy, to pretend that they aren’t.
Meanwhile, Baen Books is promoting the crowdfunding campaign on Twitter and in their forum.
One more of these network explorations. I tried a bit of data mining on the Puppy Kerfuffle Timeline. The idea was to pick out from entries people talking about other people or being talked about together. So John Scalzi talking about Vox Day or vice versa. A few stray non-people (or groups of people) got in the mix as well. Also Santa Claus?
The graph is undirected i.e. it doesn’t distinguish between talking and being talked about. Also, this is very much NOT about allegiances or other connections — a line joining a group is more likely to be a critic than an ally.