Category: Statistics

Reading Peterson 11 – Notes & Facts & Hypothesis

Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8, Part 9, Part 10, Part 11, Part 12,…

There’s no shortage of notes in Jordan B Peterson’s book 12 Rules for Life but that doesn’t mean every assertion related to facts is referenced. Also, when references are used they aren’t always tightly associated with the argument. Take this for example from chapter 2:

“This is perhaps because the primary hierarchical structure of human society is masculine, as it is among most animals, including the chimpanzees who are our closest genetic and, arguably, behavioural match. It is because men are and throughout history have been the builders of towns and cities, the engineers, stonemasons, bricklayers, and lumberjacks, the operators of heavy machinery.” – Peterson, Jordan B.. 12 Rules for Life: An Antidote to Chaos (p. 40). Penguin Books Ltd. Kindle Edition.

Now there is a lot wrong with that statement factually but the right reference here, if this was an academic essay, would be to a source discussing historical patterns of employment. Peterson instead links to some modern labour statistics here The tables do use the term ‘traditional occupations’ and ‘non-traditional’ based on proportions of women involves but this is ‘traditional’ in a very loose sense and includes “Meeting, convention, and event planners”. My point here isn’t that the table is wrong of even questioning gendered-roles in employment – just that a lot of references are weak in this fashion. It is vaguely related but not neatly tied to Peterson’s argument.

(This is quite long – so more after the fold)

Continue reading


Looking at some crowdfunding data

I’m mainly just curious how such things work but I picked on data from a Go Fund Me campaign that I know people might be morbidly curious about.


The site gives a list of donations made with the amount and how many days ago the donation was made. Doing some minor spreadsheet wrangling, it is fairly easy to turn this into graphable data. The only departure from literal truth is I used the order in which the donations are listed to spread out the data points more evenly across each day of the campaign – so the smooth growth within each day is just to make the graph easier on the eye (the raw data would just give a big vertical chunk of points).

Compared with the fundraising goal the graph looks like this:


If we assume a growth rate of $20 every three days than this campaign should reach its target in about 1317 days or about three and a half years. Of course, events may change that.

Felapton Towers Scoop – How Numbers are Disappearing

Look, there’s just some breaking stories that you can only read here thanks to the deep investigative journalism that my crack team of journalists do. In this case – plugging two digit numbers into Google n-gram.


I don’t know what I expected the graph to look like but apparently peak numbers-in-books was sometime in the late 1980s after which the bubble burst plunging books into a deepening two-digit-numbers-as-words recession.

The decline is present in both the US and the UK:


And is present for three digit numbers also:


OK but what about single digits? I hear you ask. Surely the blue-chip of the numerical world are still going strong? Nope. Number 1 obviously has been number 1 for sometime but at the turn of the millenium, even it felt the decline.

Four digit numbers? That’s a whole other ball game. The four digit market is dominated by YEARS. So peak 2000 was shortly after the year 2000 (this impacts two digit numbers a bit as well – 90 gets a bit of a boost in the 1990s).

Here is what I think is going on. The internet and the proliferation of software for sharing numerical data has created other avenues for publishing numerical data. Consequently printed documents with really large amounts of numerical data have become a smaller proportion of books published. Data sets are more likely to be made availbleas downloadable files (text, CSV etc) rather than as printed volumes.

A way to test that hypothesis would be to look at a corpus that was only FICTION. Changes in how data is published shouldn’t impact fiction! However, style habits may impact numbers written as digits – the normal prescription is that smaller numbers should be written in words. So, I’ll look at the number one hundred and twenty three as a test case – it should be written as digits normally.

Unfortunately…the results were inconclusive. When I clicked on the examples the “English Fiction” corpus was drawing from they were all NON-FICTION. Grrrrr Google giving me free tools to explore data to my hearts content and you make them not entirely perfect!

So, I can’t definitively tell you were the numbers have gone. Sorry.


You say ‘a-loomin-um’, I say ‘al-you-min-ee-um’, we both say ‘bunkum’

I resolved to not bother talking about Vox Day for awhile but circumstances compel me. The synergies of nonsense bind extreme nationalism, Trumpism, misogyny, creationism and antivaxxerism. It is always remarkable to see what apparently scientific studies the Alt-Right will quote as if gospel and which they will turn their selective scepticism too.

To wit:

What is all this about? It is the old and thoroughly debunked canard that vaccines cause autism. The idea is rooted in two coincidences: an increase in the numbers of people diagnosed with autism (primarily due to better clinical descriptions of autism spectrum and increased awareness among doctors and the public) and the timing of when autisim symptoms are often identified at an age close to when early childhood vaccinations occur. Campaigners against vaccinations have been looking for a more substantial way of linking the two and one generic culprit has been ‘toxins’ in vaccines – i.e. various additives used in the manufacture of vaccines. For a long time the supposed guilty party was mercury, particularly in the form of thiomersal – a preservative used in some vaccines. However, studies linking the two were famously debunked and many vaccines didn’t use thiomersal or other mercury compounds anyway.

Of later the antivaxxers have been pointing their fingers at a different metal: aluminium – which is just like the metal aluminum but more British. ‘Aluminium adjuvants’  are an additive to vaccine that use aluminium. Adjuvants are any substances added to vaccines whose role is to provoke an immune response (see here for a better explanation ). Tiny amounts of aluminium are added intentionally because the body’s immune system will react to the aluminium and it is that principle (which is central to the whole idea of vaccines) that has vaccination critics concerned.

Back to the study quoted. Vox Day is quoting from The Daily Mail:

BUT….the Mail article is little more than a cut and paste from here:

Which is an article by a “Chris Exley” who mainly writes alarming articles about the terrible things aluminium might do to you. Exley  is quoting a study from Keele University which is available here:

And that study was conducted by three people including…Professor Chris Exley. Who, conincidentally enough is on the editorial board of the journal the study is published in:

It is a long chain and yet oddly this is a rare case where the populist half-baked version of the study is alomost directly from the scientist involved.

Now I don’t know much about Professor Exley’s field, so I can’t really comment on the validity of the methods used. The study involved detecting aluminium in a very small number of samples of brain tissue from dead people who at some point in their lives had been disagnosed with an Autism Spectrum Disorder. There’s not much in the way of comparisons in the paper and I get the (perhaps mistaken) impression that the method is relatively new. The paper correctly concedes that “A limitation of our study is the small number of cases that were available to study and the limited availability of tissue.”

But take a critical look at the next step in the reasoning. Exley hedges what he says but Vox follows the dog whistle:

“So, the obvious question this raises is: how did so much aluminum get into the brain tissue in the first place? And the obvious answer is: from being injected with vaccines containing aluminum.” (Vox Day)

Of course a moments thought reveals that cannot be the answer. Most people do not have a diagnosed Austism Spectrum Disorder but most people are vaccinated. For Exley’s hypothesis to be correct there would need to be some additional factor, which Exley does describe in his media article:

“Perhaps there is something within the genetic make-up of specific individuals which predisposes them to accumulate and retain aluminium in their brain, as is similarly suggested for individuals with genetically passed-on Alzheimer’s disease.”

Well perhaps there is but Exley’s study doesn’t show that. More to the point, if this IS true then vaccines and aluminium adjuvants are irrelevant – we are encounter far more aluminium in our diets than we do from the tiny amounts we might get from vaccinations. Exley has zero reason to point at vaccines, indeed his speculation would imply that vaccines CANNOT be the main reason for larger amounts of aluminium in his samples because neccesarily bigger sources are more likely.

Exley appears to be trying to join two different healthscare bandwagons together: general concerns about aluminium in stuff (see his other posts) and antivaxxerism.

Is the study itself flawed? As I said, I don’t know but the connection the paper makes to vaccines has zero substance and no evidence from the study itself. That in itself should have raised red flags with reviewers.

In the past, I’d have gone to Science Blogs for some extra background on something like this but that venerable home of blogs has been wound down.

Luckily ‘Orac’ of Respectful Insolence has set up their own blog here and has a deep dive into Exley’s paper here:

Yup, it is as dodgy as somebody dodging things in a dodgy dodge. Orac points out the dubious funding source:

“The second time, I noted that he’s one of a group of scientists funded by the Child Medical Safety Research Institute (CMSRI), which is a group funded by Claire and Al Dwoskin, who are as rabidly antivaccine as anyone I’ve seen, including even Mike Adams. Among that group of antivaccine “scientists” funded by CMSRI? Anthony Mawson, Christopher Shaw, Lucija Tomljenovic, and Yehuda Shoenfeld, antivaccine crank “scientists” all. And guess what? This study was funded by CMSRI, too. Fair’s fair. If antivaxers can go wild when a study is funded by a pharmaceutical company and reject it out of hand, I can point out that a study funded by an antivaccine “foundation” is deserving of more scrutiny and skepticism.”

And it just gets worse from there. No controls, some tiny sample jiggery-pokery with the numbers and so on. Best read directly.



It is only a tiny step from pointless science to pseudoscience and I’m thinking…it’s a rainy Sunday and my head hurts…

After my previous post on this topic, it occurred to me that I should check the profile of some other websites. I’d already identified that Vox Day’s blog was disproportionately Goat-Wolf-Rabbit. What about Monster Hunter Nation?


A clear Tiger-Goat-Cow blog. Cats do quite well at MHI in terms of raw numbers but not when compared against their general frequency.

Moving away from the right, how about File770?


Mike is running a Cat-Tiger-Goat blog it seems. Now note that the search method includes comments, so it may be the readers that have a thing about cats (this has been independently confirmed).

What do all three blogs have in common? GOATS.

[ETA – Rocket Stack Rank is interesting because the animals mentioned would be more determined by their incidence in short fiction. Overall low frequencies and RSR has no presence on the otter or goose dimensions. Wolf-Rabbit-Cat blog – “Cat” strongly assisted by reviews of the works of Cat Rambo 🙂

Goat has a presence but is just shy of the top 3.


Today in Pointless Statistics

Yesterday, I was speculating about how the far-right may have a fear of rabbits. I’ve no means of ascertaining that but I did wonder if rabbits got mentioned more than you would expect.

Disproportionate Lagomorphic Referencing in Ideologically Extreme Propaganda

By C.Felapton, M.Robot 2017


It has been postulated that the alt-right talks about rabbits a lot. Our research unit examined this hypothesis empirically using highly advanced data-mining techniques.

Using a sample of common animal words, the frequency of use of those words was established and then compared with word frequency in an established corpus of English words. It was established that at least one member of the alt-right talks about rabbits disproportionately.


A weblog site produced by a notable “alt-right” writer was identified by a process of his being the obvious one to have a look at (a blogger who use the pseudonym of “Vox Day”). A set of 14 common animal nouns was identified: cat, chicken, cow, dog, elephant, goat, goose, mouse, otter, rabbit, sheep, tiger, wolf.

For comparison purposes, a corpus of English words was identified to establish standard frequencies for each word. The selected corpus was the BYU-BNC.

The British National Corpus (BNC) was originally created by Oxford University press in the 1980s – early 1990s, and it contains 100 million words of text texts from a wide range of genres (e.g. spoken, fiction, magazines, newspapers, and academic).

Using Google’s site specific search function, the target website was searched using each animal word in turn as the search term. An example search query being “mouse site::”

The number of “hits” per search term was recorded.


The most common animal name used from the sample was “dog”. However, given the very high frequency of “dog” in English, this result is unremarkable. The ratio of the blog frequency versus the corpus frequency was calculated. The mean ratio for the sample was 0.728 (to 3 s.f.) [blog freq/BNC freq].

The most disproportionately under mentioned animal was “mouse”. The most disproportionately over mentioned animal was “goat”. While the frequencies of both “rabbit” and “wolf” were quite different in both the blog and the corpus, both words were over mentioned in a similar ratio (1.20 for rabbit and 1.21 for wolf).

Full results are shown in Appendix A.


It was agreed by the research team that this had been a pointless exercise that provided no valuable insights and which was methodologically flawed due to its arbitrary choice of words, blog and corpus. Meat Robot complained about having a cold a lot and suggested that a day spent re-watching Rogue One: A Star Wars Story would be a better plan. “You’re not the boss of me.” said Camestros but had to concede that it was impossible to exist as incorporeal being.

A cat refused to comment on the result and no other animals were consulted.

Appendix A: Full results

The table shows the full results in ascending order of ratio.

Animal Blog Freq BNC Freq Ratio




















































Spotting Fakery?

I previously pointed to an article on people manipulating Amazon rankings for their books, today there is a bigger brouhaha on whether somebody has manipulated the New York Time bestseller list: The method used (if true) isn’t new and political books have been prone to this approach before i.e. buy lots of the book from the right bookshops and head up the rankings.

One thing new to me from those articles was this site: It claims to be a site that will analyse reviews on sites like Amazon and Yelp and then rate the reviews in terms of how “fake” they seem to be. The mechanism looks at reviewers and review content and looks for relations with other reviews, and also rates reviewers who only ever give positive reviews lower. Now, I don’t know if their methods are sound or reliable, so take the rest of this with a pinch of salt for the time being.

Time to plug some things into their machine but what! Steve J No-Relation Wright has very bravely volunteered to start reading Vox Day’s epic fantasy book because it was available for $0 ( ) and so why not see what Fakespot has to say about “Throne of Bones”


Ouch…but to some extent, we already know that the comment section of Vox’s blog is full of willing volunteers ready to do sycophanting stuff and/or trolling/griefing at Vox’s request. Arguably those are genuine reviews, just that they are hard to distinguish between click-farm fakery. Think of it as a kind of Turing Test, which his right-wing minions repeatedly fail by acting like…well, minions.

How reliable is this? There’s no easy way to tell. As a side-by-side experiment I put in Castalia’s attempt at spoiler campaign versus the mainstream SF book they were trying to spoil:

Ironically, the reviews that Vox complains about, probably improve the Fakespot rating of the reviews – i.e. many negative reviews from people will make the rating of the quality of the reviews better. I also don’t see a way in general of Fakespot distinguishing between fake NEGATIVE reviews -i.e. showing that the poor ratings of a book aren’t genuine.

[A note of caution: the site doesn’t re-analyse automatically so the analysis you get may be out of date. The initial ratings for those two books were different but changed when I clicked the option to re-analyse]

I also don’t see a way in general of Fakespot distinguishing between fake NEGATIVE reviews -i.e. showing that the poor ratings of a book aren’t genuine. The basic report seems to assume that fake reviews are for the purpose of the seller artificially boosting a book rather than somebody maliciously trying to make a book look bad.