A bad survey about the ‘Intellectual Dark Web’

This is an edited version of three Twitter rants from yesterday. It started as an off-cuff reaction but I was too far into it before I thought that it should be a blog-post rather than Tweets.

Stephen Pinker tweeted out a very weird bit of science theatre created by Michael Shermer.

Pinker has enough critical thinking skills that he should look at it with hefty scepticism…but obviously isn’t. It’s pretend science, using play-acting at science to refute what is obvious and ignores the core issues.

The “survey” by Michael Shermer (which should be a red flag in itself) was sent to 34 notable people associated with the label “Intellectual Dark Web” and asked where they stand on a number of issues. The survey was anonymous, so the views identified in the survey can’t be matched to the individuals asked. https://www.skeptic.com/reading_room/preliminary-empirical-study-shedding-light-on-intellectual-dark-web/

Each and every one of the people surveyed is a public figure who have made multiple public statements about politics and social issues. I don’t need an anonymous survey to find out what Andy Ngo or Sam Harris thinks, I can go and read what they say. And it is what they SAY that matters and what defines the IDW term not what they might privately think. If Sam Harris thinks he has warm & fuzzy liberal beliefs that’s nice but the whole point of the “dark web” label was the contrarian issues he promotes. Maybe Ben Shapiro secretly believes Global Warming is real and climate change is caused by humans. I don’t know but what matters is he propagandises the opposite. If an anonymous survey of the 34 “Intelectual Dark” Webbers reveals that their underlying views are more centrist and mainstream then that is not evidence that the public perception of their public positions is wrong. Rather it confirms a key point about the IDW.

The fundamental issues with the disparate group lumped together as the Intellectual Dark Web is that they are DISINGENUOUS about their politics. It’s not news that Jordan Peterson thinks of himself as moderate and reasonable. We knew that already. It doesn’t change that he (and Harris & Shapiro & Ngo & Quillette) frame and enable a perspective that bolsters the far right. The whole “we are the reasonable ones” is part of the schtick of the IDW. That they’ll boost that in an anonymous survey is, frankly, wank.

Let’s be sceptical as I’m sure Dr Pinker and Shermer would want us to be. Let’s take one conclusion Pinker raises from the survey: The members of the IDW are “concerned w climate” Let’s look at the survey: The survey agrees: “67% strongly agreed that global warming is caused by human actions (no one strongly disagreed)” So their you go! Hoorah! No, no let us be sceptical first. If this was GENUINELY true would it not be easily observed?

To the empircism-mobile! Here’s the output of the Quillete Climate tag https://quillette.com/tag/climate/ zoiks! A hefty TWO article, one concern trolling Greta Thunberg and the other saying people shouldn’t be mean to capitalism. Yes, Quillette is just one source but it is one that connects Steven Pinker on the one hand (who we can observe genuinely does advocate for action on Global Warming) with Andy Ngo on the other hand (who genuinely does have connections with the alt-right and violent far right groups) via Claire Lehmann (Quillette’s founder, fan of Pinker and one time boss of Ngo).

Yes, Steven Pinker himself has a better record on the of global warming but the issue he raised was to look collectively at the IDW and their media-organs. Broadly this is not a group trying to do very much about helping with the issue. And wow, think of the actually good the IDW could achieve given their actual audience. Whatever they may think of themselves, collectively they do have the ear of many on the right – exactly where climate change denial and bad science on the topic is endemic. You’d think these out spoken people might be busy being outspoken on a potential planet wide disaster.

It gets worse. The actual sample was only 18 not 34 people. Nearly half of the 34 didn’t answer. So when the survey says “67%” (the percentage favouring gun control and which believes global warming is real) actually means “12 people” That’s actually both more plausible and more wretched. Even if we accept that 12 of those IDWs think climate change is real, it says almost nothing about the group. Any one member of the original 34 people is a hefty 3% of the population being sampled and hence missing any one of them can have a large impact on the results. This is particularly true given that we already know that the label of “Intellectual Dark Web” is being attached to a group with a very broad range of views on many topics.

Shermer is assuming non-response to the survey is random across the traits being surveyed (i.e the 18 is a random sample of the 34). There is no reason to believe that and really anybody who is wants to seriously call themselves a sceptic should dismiss any general conclusion from the survey without substantial additional supporting evidence.

Indeed there’s good reason to assume that the 18 who responded is not a good random sample of the 34, just on the nature of the numbers. It is very hard with small numbers in a survey for the sample to be representative because one person makes a big difference. Shermer hides that by quoting percentages rather than raw totals but with small number percentages hide how few people he’s talking about. It’s not invalid to look at proportions with small sample sizes, sometimes that is all you have but there’s a point where 12 out of 18 is more informative than 67%.

We can illustrate the issue with the women who were surveyed. Of the 34 named people in survey associated with the “Intellectual Dark Web” 8 (24%) are women. In the survey 3 (17%) are women. So are the IDW 17% women (generalising from survey) or 24%? Obviously 24% is the correct figure but 17% is the equivalent of the the kind of survey conclusions Shermer presents. In fact any one woman listed is 13% of the IDW women, so one more woman answering makes a huge different to sub-sample of women. Any one person is 6% of the whole sample of 18 people!

Circling back to 67% claim. Again assuming everybody who responded is being honest (which I doubt) the survey actually found that 12 people of the 34 who were asked believed in gun control and the same number believed that global warming was real (which I’ll add isn’t saying much, some prominent sceptics will say global warming is real, just as many anti-vaccination campaigners will say they support vaccinations – it is the ‘but’ that follows where the issues lie). That might mean 67% or there about of 34 believe in gun control but a safer conclusion is no less than 35% do (12/34) and no more than 82% (28/34). Given how granular this data is, hoping the estimate is in the middle isn’t supported.

This is why I call it theatre. It is the wrong methodology applied badly. It illustrates methodological snobbery. Synthesising the complex views of a small group of people is exactly where qualitative methods work better. It is a domain where you need to put on your humanities hat and apply those humanities skills. Shermer is using sciencey film-flam by presenting a pointlessly anonymous survey and presenting the results as percentages as if there were proportions of the whole group.

Don’t get me wrong I absolutely LOVE applying basic quantitive methods to things and place where they don’t always make sense. It’s very much my hobby but even on this less than 100% serious blog I’d throw more caveats at better numbers than Shermer is using.

Loved Books: The Mismeasure of Man by Stephen J Gould

Stephen Jay Gould is a voice that is missed in today’s world. Smart, compassionate and analytical but also with a deft capacity to write about complex ideas in an engaging way. In The Mismeasure of Man Gould stepped out of his main field of paleontology and looked at the history of attempts to measure intelligence and the racist assumptions that have run through those attempts. This is the 1981 edition which doesn’t have the chapters on The Bell Curve but still a worthy read.

Is it perfect? No but then a popular account of broad area of research necessarily simplifies and skips over some details. As gateway into understanding the issues there is no better book that I’m aware of.

Ersatz Culture’s Gender Graphs

Ersatz Culture has been systematically graphing all the awards (well, lots of them but maybe not all of them) in terms of gender and very systematically.

https://sf.ersatzculture.com/gender-award-charts/

There are a host of different patterns in those graphs – note these are my observations not those of Ersatz Culture. Some awards are more volatile than others and, of course, some awards are very recent. Overall, there has been the shifted already noted from:

  1. Mainly men
  2. More men than women but many women
  3. Mainly women

The nearest graph to one that splits neatly into these phases is the Nebula Award for Short Story https://sf.ersatzculture.com/gender-award-charts/index-nebula-sfwa.html#nebula-award-short-story but as with any narrative overlaid on data, take it as the speculation it is.

There are few examples of an award bouncing around a 50/50 split. The Arthur C Clarke award though seems to have less of a trend and more of a noisy wobble around a 70/30ish split. https://sf.ersatzculture.com/gender-award-charts/index-british.html#arthur-c-clarke-award-best-science-fiction-novel

Young Adult awards have been more favourable to women. Fantasy awards have tended to be more favourable to women also. Any shift in a generic award towards YA or fantasy therefore might also lead to a shift towards women.

New writer awards (the former-Campbell Award, Locus Best First Novel) have often had a better split (not always a good split) than other awards in the same year. That is interesting as they might be a leading indicator of future award demographics in these awards.

Some comparison data on gender: Amazon

More as a data grabbing exercise than anything, I tabulated the Amazon Best Seller list for Science Fiction and Fantasy: https://www.amazon.com/Best-Sellers-Books-Science-Fiction-Fantasy/zgbs/books/25/ref=zg_bs_pg_1?_encoding=UTF8&pg=1

This data is a snapshot and right now the list is naturally dominated by Margaret Atwood’s sequel to The Handmaid’s Tale, so the list contains a lot of different versions of both (print version, audio version, Kindle version etc). It’s also very Amazon with some popular-in-Kindle-unlimited works further down the ranks.

I took the top 100 listed and then did a few things to the data. Firstly, I deleted multiple versions of a work, that will add a bit of bias to the data by understating the impact of biggest sellers. I then classified authors based on name, pronouns, and bios as male, female, non-binary or both (in the case of dual authors). I didn’t identify any authors for the non-binary category. One author name was a joint authorship of a man & woman and was counted as “both”. That took the initial 100 rows down to 84 rows.

I then duplicated that data set and in the second version I deleted multiple works by an author leaving only the highest ranked work from the Amazon list. This was done so a single author wasn’t double counted (or n-tuple counted in the case of J.K.Rowling) but the process reduces the success of authors like Rowling or Stephen King. That took the number of rows down to 55.

The results are delightfully ambiguous with enough contrary results to please multiple readings.

Gender All Works Top Work All Works Top 50 Top Work Top 50
Female 56% 49% 61% 54%
Male 43% 49% 39% 46%
Both 1% 2% 0% 0%
  • All Works: counts by author gender of the 85 books in the SFF Amazon bestsellers.
  • Top Works: counts by author of the 55 books by unique authors in the SFF Amazon bestsellers.
  • All Works Top 50: counts by author ranked 50 or better out of the 85 (36 books).
  • Top Work Top 50: counts by authors ranked 50 or better out of the 55 (24 books).

Looking at just works ranked 25 or better results in a figure more consistent between the two sets of data.

Gender All Works Top 25 Top Work Top 25
Female 59% 56%
Male 41% 44%

Make of this what you will 🙂

A bit more on Dragons and probabilities etc

I had some weird conversations yesterday about Dragon Award stats. One was a brilliant take down of my figure that 10 men out of 10 had won Dragon Awards from 2016 in the two headline categories. Aha! Four years and two categories is only EIGHT! Yeah but it really is ten men. James S A Corey is actually two people and, even harder to believe, apparently John Ringo and Larry Correia are different. Mind you…if I only count Larry Correia once (because he is the same person whichever year he’s in) then it is back to 8 again…You’ll note that however we count it the answer comes out the same: 100% have gone to men in the two headline categories.

The discussion does raise a relevant point about why statistics is hard. Even a basic stat like a count of how many out of how many requires engaging your brain and thinking carefully about what you are counting. It was suggested that I should have said 10 men out of 8 awards…which I guess makes it clearer what was being counted but is horrible arithmetically. It looks like “10 out of 8” i.e. 125% which is nonsense because we are diving two different things and creating a derived unit of men per awards.

I’ll point people back to this post https://camestrosfelapton.wordpress.com/2019/08/10/dragon-award-by-gender/ and this post https://camestrosfelapton.wordpress.com/2019/08/11/more-dragon-stats/ where I talked in more detail about what I counted and how.

To round off that previous gender post here is an equivalent graph of winners by gender in the book category:

Like the graph in the previous post of finalists, I’m using counts by gender which reduces the gender disparity by only counting two joint authors of the same gender as 1 but two joint authors of different genders as 1 each per gender. Same caveats about gender as a binary classification apply as with the earlier post.

Worst year was 2017 which was also peak Rabid Puppy influence.

A couple of conceptual questions have come up that are related. I was asked elsewhere what the chance was of so many authors on Brad’s list winning. A different question with the same kind of issue was asked by James Pyles – basically what was the chance of N.K.Jemisin winning a Hugo three times in a row.

Both questions aren’t something that can easily be answered and they sort of miss the point of the kind of comparisons against chance you might do with gender. With the Brad list these were people who were plausible winners, the outcome wasn’t surprising. There’s no expectation that the result of an award is a random event when looking at individuals – the same is true with Jemisin. We could say, well there’s 7 billion people on earth and one winner so the chance is 1/7 billion and the chance of winning three times is (1/7 billion)^3 and then concluding that everything is impossible but the comparison is silly.

Comparing with chance is there to test a kind of hypothesis: specifically whether the result is plausibly the result of chance. If the probability is tiny then we can reject that it happened by chance. We already know that somebody winning a Dragon or a Hugo isn’t by chance because names aren’t picked out of a hat.

So why compare gender of winners to chance events if we know winning isn’t a chance event? Good question. Because, we are testing another level of hypothesis. With gender, the hypothesis could be stated as ‘gender is an irrelevant variable with regard to winning award X’.

Consider this. Imagine if all Dragon (or Hugo) winners were born on a Tuesday. That would be remarkable. Day of the week surely isn’t connected to whether you win an award or not! We might reasonably expect only one-seventh of winners to be born on a Tuesday. We might do extra research to see if across all people if day-of-the-week is evenly distributed. We might fine tune that further and consider only English speakers or only Americans etc. The point being that if day-of-the-week departed from chance then we would reject that day-of-the-week is irrelevant.

If we did find that, it wouldn’t tell us why or how day-of-the-week was relevant. One response I’ve seen to producing gender stats is people saying that they don’t pay attention to author’s gender when voting. Even if we ignore subconscious influences and take that at face value, all that does is remove one possible cause of a gender disparity, it doesn’t make the gender disparity go away.

Another response is that looking at gender stats is ‘politics’. Well, yes, it is but it is relevant even if we otherwise lived in a gender neutral utopia. Again, imagine if Tuesday-born people won far more sci-fi awards than other people — that would be fascinating even though we don’t live in a world of Tuesday-privilege.

More Hugo Graphs, Fanzine & Ramblings

Nicholas Whyte has an insightful look at the 2019 Hugo stats here: https://nwhyte.livejournal.com/3244665.html

The biggest issue raised is that final votes for Best Fanzine came perilously close to less than 25% of the total votes. [stats are now on the Hugo history pages here http://www.thehugoawards.org/wp-content/uploads/2019/08/2019-Hugo-Statistics.pdf ] Whyte says:

“We were surprisingly close to not giving a Best Fanzine award in both 2019 Hugos and 1944 Retro Hugos this year. The total first preference votes for Best Fanzine finalists other than No Award in both cases was 26.9% of the total number of votes cast overall (833/3097 and 224/834).”

Eeek! Consider this year that we’ve had worries about the nature of Best Fanwriter, eligibility issues with Best Fan Artist and now Best Fanzine looks a bit endangered. Fan categories are part of the soul of the Hugo Awards!

There’s two different kinds of response to Hugo issues. One is to respond structurally: change, add or remove categories; play with eligibility rules; change voting methods etc. The other is to respond behaviourally; change how we make decisions as voters. In the second case, a good example is the range of sites that came into being to help people find things to nominate in the Hugo Awards.

The Hugo voting community is big enough that a structural response makes sense but it is also small enough that change can be effected by persuading people to think differently about how they vote. One of the most positive examples of the latter is the Lady Business Hugo Spreadsheet of Doom. http://bit.ly/hugoaward2019 <-2019 version.

I decided to have a bit of a look at figures I could derive from that sheet and compare them with the Hugo stats. To do that I just counted up numbers of nominations in each category and then added nomination & final vote stats for those categories from this year’s Hugo stats. I will confess to a bit of sloppy counting: sometimes there is one header row in a category and sometimes there’s two or three and so sometimes my counts are a out by 1 or 2.

What did I find? Well, on average the number of works listed per category on the Hugo Spreadsheet of Doom (HSD from now on) was about 20% of the total number of works nominated. I haven’t done a side-by-side comparison with the long list but I think the HSD is a good early indicator of the level of interest in a category. I’ll come back to this.

Firstly some general correlations. Nomination votes correlate with final vote totals.

Whether that works causally I don’t know i.e. if we all encouraged each other to nominate things in fanzine (anything – not a campaign for a fanzine) would that lead to an increase in final votes for fanzine? Maybe.

Now let’s look at nomination counts. The more things listed in the HSD the more nominees there are. That’s probably not causal — they’ll both be related to a hidden variable that we could call “category interest”.

There’s a one point doing a lot of work on that graph though. Short story gets huge numbers of suggestions, way more than other categories.

Let’s connect some dots. Do the number of nominees correlate with the size of the votes? That sounds plausible but let’s see:

Very roughly, yes but it isn’t a tight relationship. I decided to cut out the intervening figures and just look at HSD counts versus final votes.

Unfortunately Short Story is such an outlier that the relationship gets obscured. I decided to remove Short Story and Novel as categories as they are clearly special.

It’s not nothing and considering how many steps away a very broad list of suggestions is from vote totals on a small set of finalists, it’s a fair bit of something. There’s three categories which fall well below the line of best fit on the right hand side of the graph. Interestingly they are points for Lodestar, Fan writer and Art book. Two of those categories are new(ish) and I know I personally added a lot of names to Fan Writer as part of my project to gather lots of names for Fan Writer.

Cherry picking even further by removing Lodestar, Fanwriter and Art Book, the relationship looks tighter but take this with substantial amounts of salt.

So, here’s what I conclude. Obviously just adding names to an eligibility spreadsheet won’t increase final votes. However, encouraging early interest in nominations (which we can measure with how entries on an eligibility spreadsheet) may well have a positive impact on final votes.

Promoting interest in possible picks for Best Fanzine over the following months up to the close of 2020 Hugo nominations will, I strongly suspect, lead to an increase in final votes for Fanzine.

How many finalists? Crunching continued…

This is a follow up to the earlier post. Read that post first for background and the data I’m looking at.

I’ve looked at 2018 Hugo data for both stages:

  • The nomination stage by EPH
  • The final voting stage by IRV

My impression was that there are some changes in the ranking between the two but not so many as to cast doubt on the nomination process itself nor so few changes as to make the final voting stage redundant. It looks like things are pretty much in a sweet spot:

  • final winners are often the top finalists — which implies there’s not a mismatch between how people nominate and how they vote (or between the people voting at each stage etc)
  • low ranked finalists often do better in the final voting — which implies that there is a lot of value in a two stage process.

To show that here is a graph of how the rankings compare between EPH stage 1 and IRV stage 2 of the Hugo voting process:

The width of a blob indicates the frequency of that pair of ranks. For example there were 9 cases of 1st rank EPH coming 1st in the final stage and 10 cases of 4th ranked finalist coming 3rd in the final stage. I’m not sure if a simple linear regression is appropriate with this data but Excel tells me that the first stage voting accounts for about 25% of the variance in the second stage ranks.

However, can we look at this data and say how long the finalist list should be? Are there ENOUGH finalists? Should there be a list of 7 or 8? Putting administrative and practical limits aside I think we can examine this question with the data.

Obviously, I’m only looking at one year, so any conclusions are tentative and limited. I could look further but recent data is weird due to Puppy activities and there have been rule changes since. So, I’m sticking with 2018 (also I’m lazy).

One graph I drew was to look at the distribution of the differences in rank between the two stages.

Again we can see that no change (zero on the x-axis) is common but that bigger changes in rank happen. Unfortunately, we really can’t take this as being true of every ranking. Obviously rank 6 finalists can only either stay the same of go upwards.

A different way of thinking about the issue would be to consider what would happen with different number of finalists. For example, what if in 2018 there was only 1 finalist per category? Yes, that’s silly be we can work out that of the 15 categories I looked at, 9 would have the same winner as what actually happened and that 6 wouldn’t. 1 finalist would contain 60% of the actual winners.

  • 1 finalist: 9 or 60% of winners
  • 2 finalists: 12 or 80% of winners
  • 3 finalists: 14 or 93% of winners
  • 4 finalists: 14 or 93% of winners (i.e. no extra winners)
  • 5 finalists: 15 or 100% of winners
  • 6 finalists: 15 or 100% of winners

So for most categories 3 finalists would just about do. Adding finalists after 3 brings only small gains but 2018 still need 5 finalists to capture all the eventual winners.

Now, obviously, if we added more finalists people’s choices and the voting would change but we can see from the trend that the gains trail off quickly after 3 finalists.

So is five enough? Five clearly works but that’s actually an argument for having six finalists if you want to be confident you’ve got all the plausible contenders. As we definitely got one fifth ranked finalist winning a category (Rebecca Roanhorse in the Campbell Award) there’s maybe a 6% chance of rank 5 finalist winning (one winner out of 15).

Add in the possibility of one finalist being in some way dodgy or have cheated etc then 6 is a safe contingency. Does the same argument not work for 7 or 8 finalists? No, because we can see that the gains trail off rapidly after 3 finalists. Five is probably enough, six is almost certainly enough.