Some comparison data on gender: Amazon

More as a data grabbing exercise than anything, I tabulated the Amazon Best Seller list for Science Fiction and Fantasy: https://www.amazon.com/Best-Sellers-Books-Science-Fiction-Fantasy/zgbs/books/25/ref=zg_bs_pg_1?_encoding=UTF8&pg=1

This data is a snapshot and right now the list is naturally dominated by Margaret Atwood’s sequel to The Handmaid’s Tale, so the list contains a lot of different versions of both (print version, audio version, Kindle version etc). It’s also very Amazon with some popular-in-Kindle-unlimited works further down the ranks.

I took the top 100 listed and then did a few things to the data. Firstly, I deleted multiple versions of a work, that will add a bit of bias to the data by understating the impact of biggest sellers. I then classified authors based on name, pronouns, and bios as male, female, non-binary or both (in the case of dual authors). I didn’t identify any authors for the non-binary category. One author name was a joint authorship of a man & woman and was counted as “both”. That took the initial 100 rows down to 84 rows.

I then duplicated that data set and in the second version I deleted multiple works by an author leaving only the highest ranked work from the Amazon list. This was done so a single author wasn’t double counted (or n-tuple counted in the case of J.K.Rowling) but the process reduces the success of authors like Rowling or Stephen King. That took the number of rows down to 55.

The results are delightfully ambiguous with enough contrary results to please multiple readings.

Gender All Works Top Work All Works Top 50 Top Work Top 50
Female 56% 49% 61% 54%
Male 43% 49% 39% 46%
Both 1% 2% 0% 0%
  • All Works: counts by author gender of the 85 books in the SFF Amazon bestsellers.
  • Top Works: counts by author of the 55 books by unique authors in the SFF Amazon bestsellers.
  • All Works Top 50: counts by author ranked 50 or better out of the 85 (36 books).
  • Top Work Top 50: counts by authors ranked 50 or better out of the 55 (24 books).

Looking at just works ranked 25 or better results in a figure more consistent between the two sets of data.

Gender All Works Top 25 Top Work Top 25
Female 59% 56%
Male 41% 44%

Make of this what you will 🙂

A bit more on Dragons and probabilities etc

I had some weird conversations yesterday about Dragon Award stats. One was a brilliant take down of my figure that 10 men out of 10 had won Dragon Awards from 2016 in the two headline categories. Aha! Four years and two categories is only EIGHT! Yeah but it really is ten men. James S A Corey is actually two people and, even harder to believe, apparently John Ringo and Larry Correia are different. Mind you…if I only count Larry Correia once (because he is the same person whichever year he’s in) then it is back to 8 again…You’ll note that however we count it the answer comes out the same: 100% have gone to men in the two headline categories.

The discussion does raise a relevant point about why statistics is hard. Even a basic stat like a count of how many out of how many requires engaging your brain and thinking carefully about what you are counting. It was suggested that I should have said 10 men out of 8 awards…which I guess makes it clearer what was being counted but is horrible arithmetically. It looks like “10 out of 8” i.e. 125% which is nonsense because we are diving two different things and creating a derived unit of men per awards.

I’ll point people back to this post https://camestrosfelapton.wordpress.com/2019/08/10/dragon-award-by-gender/ and this post https://camestrosfelapton.wordpress.com/2019/08/11/more-dragon-stats/ where I talked in more detail about what I counted and how.

To round off that previous gender post here is an equivalent graph of winners by gender in the book category:

Like the graph in the previous post of finalists, I’m using counts by gender which reduces the gender disparity by only counting two joint authors of the same gender as 1 but two joint authors of different genders as 1 each per gender. Same caveats about gender as a binary classification apply as with the earlier post.

Worst year was 2017 which was also peak Rabid Puppy influence.

A couple of conceptual questions have come up that are related. I was asked elsewhere what the chance was of so many authors on Brad’s list winning. A different question with the same kind of issue was asked by James Pyles – basically what was the chance of N.K.Jemisin winning a Hugo three times in a row.

Both questions aren’t something that can easily be answered and they sort of miss the point of the kind of comparisons against chance you might do with gender. With the Brad list these were people who were plausible winners, the outcome wasn’t surprising. There’s no expectation that the result of an award is a random event when looking at individuals – the same is true with Jemisin. We could say, well there’s 7 billion people on earth and one winner so the chance is 1/7 billion and the chance of winning three times is (1/7 billion)^3 and then concluding that everything is impossible but the comparison is silly.

Comparing with chance is there to test a kind of hypothesis: specifically whether the result is plausibly the result of chance. If the probability is tiny then we can reject that it happened by chance. We already know that somebody winning a Dragon or a Hugo isn’t by chance because names aren’t picked out of a hat.

So why compare gender of winners to chance events if we know winning isn’t a chance event? Good question. Because, we are testing another level of hypothesis. With gender, the hypothesis could be stated as ‘gender is an irrelevant variable with regard to winning award X’.

Consider this. Imagine if all Dragon (or Hugo) winners were born on a Tuesday. That would be remarkable. Day of the week surely isn’t connected to whether you win an award or not! We might reasonably expect only one-seventh of winners to be born on a Tuesday. We might do extra research to see if across all people if day-of-the-week is evenly distributed. We might fine tune that further and consider only English speakers or only Americans etc. The point being that if day-of-the-week departed from chance then we would reject that day-of-the-week is irrelevant.

If we did find that, it wouldn’t tell us why or how day-of-the-week was relevant. One response I’ve seen to producing gender stats is people saying that they don’t pay attention to author’s gender when voting. Even if we ignore subconscious influences and take that at face value, all that does is remove one possible cause of a gender disparity, it doesn’t make the gender disparity go away.

Another response is that looking at gender stats is ‘politics’. Well, yes, it is but it is relevant even if we otherwise lived in a gender neutral utopia. Again, imagine if Tuesday-born people won far more sci-fi awards than other people — that would be fascinating even though we don’t live in a world of Tuesday-privilege.

More Hugo Graphs, Fanzine & Ramblings

Nicholas Whyte has an insightful look at the 2019 Hugo stats here: https://nwhyte.livejournal.com/3244665.html

The biggest issue raised is that final votes for Best Fanzine came perilously close to less than 25% of the total votes. [stats are now on the Hugo history pages here http://www.thehugoawards.org/wp-content/uploads/2019/08/2019-Hugo-Statistics.pdf ] Whyte says:

“We were surprisingly close to not giving a Best Fanzine award in both 2019 Hugos and 1944 Retro Hugos this year. The total first preference votes for Best Fanzine finalists other than No Award in both cases was 26.9% of the total number of votes cast overall (833/3097 and 224/834).”

Eeek! Consider this year that we’ve had worries about the nature of Best Fanwriter, eligibility issues with Best Fan Artist and now Best Fanzine looks a bit endangered. Fan categories are part of the soul of the Hugo Awards!

There’s two different kinds of response to Hugo issues. One is to respond structurally: change, add or remove categories; play with eligibility rules; change voting methods etc. The other is to respond behaviourally; change how we make decisions as voters. In the second case, a good example is the range of sites that came into being to help people find things to nominate in the Hugo Awards.

The Hugo voting community is big enough that a structural response makes sense but it is also small enough that change can be effected by persuading people to think differently about how they vote. One of the most positive examples of the latter is the Lady Business Hugo Spreadsheet of Doom. http://bit.ly/hugoaward2019 <-2019 version.

I decided to have a bit of a look at figures I could derive from that sheet and compare them with the Hugo stats. To do that I just counted up numbers of nominations in each category and then added nomination & final vote stats for those categories from this year’s Hugo stats. I will confess to a bit of sloppy counting: sometimes there is one header row in a category and sometimes there’s two or three and so sometimes my counts are a out by 1 or 2.

What did I find? Well, on average the number of works listed per category on the Hugo Spreadsheet of Doom (HSD from now on) was about 20% of the total number of works nominated. I haven’t done a side-by-side comparison with the long list but I think the HSD is a good early indicator of the level of interest in a category. I’ll come back to this.

Firstly some general correlations. Nomination votes correlate with final vote totals.

Whether that works causally I don’t know i.e. if we all encouraged each other to nominate things in fanzine (anything – not a campaign for a fanzine) would that lead to an increase in final votes for fanzine? Maybe.

Now let’s look at nomination counts. The more things listed in the HSD the more nominees there are. That’s probably not causal — they’ll both be related to a hidden variable that we could call “category interest”.

There’s a one point doing a lot of work on that graph though. Short story gets huge numbers of suggestions, way more than other categories.

Let’s connect some dots. Do the number of nominees correlate with the size of the votes? That sounds plausible but let’s see:

Very roughly, yes but it isn’t a tight relationship. I decided to cut out the intervening figures and just look at HSD counts versus final votes.

Unfortunately Short Story is such an outlier that the relationship gets obscured. I decided to remove Short Story and Novel as categories as they are clearly special.

It’s not nothing and considering how many steps away a very broad list of suggestions is from vote totals on a small set of finalists, it’s a fair bit of something. There’s three categories which fall well below the line of best fit on the right hand side of the graph. Interestingly they are points for Lodestar, Fan writer and Art book. Two of those categories are new(ish) and I know I personally added a lot of names to Fan Writer as part of my project to gather lots of names for Fan Writer.

Cherry picking even further by removing Lodestar, Fanwriter and Art Book, the relationship looks tighter but take this with substantial amounts of salt.

So, here’s what I conclude. Obviously just adding names to an eligibility spreadsheet won’t increase final votes. However, encouraging early interest in nominations (which we can measure with how entries on an eligibility spreadsheet) may well have a positive impact on final votes.

Promoting interest in possible picks for Best Fanzine over the following months up to the close of 2020 Hugo nominations will, I strongly suspect, lead to an increase in final votes for Fanzine.

How many finalists? Crunching continued…

This is a follow up to the earlier post. Read that post first for background and the data I’m looking at.

I’ve looked at 2018 Hugo data for both stages:

  • The nomination stage by EPH
  • The final voting stage by IRV

My impression was that there are some changes in the ranking between the two but not so many as to cast doubt on the nomination process itself nor so few changes as to make the final voting stage redundant. It looks like things are pretty much in a sweet spot:

  • final winners are often the top finalists — which implies there’s not a mismatch between how people nominate and how they vote (or between the people voting at each stage etc)
  • low ranked finalists often do better in the final voting — which implies that there is a lot of value in a two stage process.

To show that here is a graph of how the rankings compare between EPH stage 1 and IRV stage 2 of the Hugo voting process:

The width of a blob indicates the frequency of that pair of ranks. For example there were 9 cases of 1st rank EPH coming 1st in the final stage and 10 cases of 4th ranked finalist coming 3rd in the final stage. I’m not sure if a simple linear regression is appropriate with this data but Excel tells me that the first stage voting accounts for about 25% of the variance in the second stage ranks.

However, can we look at this data and say how long the finalist list should be? Are there ENOUGH finalists? Should there be a list of 7 or 8? Putting administrative and practical limits aside I think we can examine this question with the data.

Obviously, I’m only looking at one year, so any conclusions are tentative and limited. I could look further but recent data is weird due to Puppy activities and there have been rule changes since. So, I’m sticking with 2018 (also I’m lazy).

One graph I drew was to look at the distribution of the differences in rank between the two stages.

Again we can see that no change (zero on the x-axis) is common but that bigger changes in rank happen. Unfortunately, we really can’t take this as being true of every ranking. Obviously rank 6 finalists can only either stay the same of go upwards.

A different way of thinking about the issue would be to consider what would happen with different number of finalists. For example, what if in 2018 there was only 1 finalist per category? Yes, that’s silly be we can work out that of the 15 categories I looked at, 9 would have the same winner as what actually happened and that 6 wouldn’t. 1 finalist would contain 60% of the actual winners.

  • 1 finalist: 9 or 60% of winners
  • 2 finalists: 12 or 80% of winners
  • 3 finalists: 14 or 93% of winners
  • 4 finalists: 14 or 93% of winners (i.e. no extra winners)
  • 5 finalists: 15 or 100% of winners
  • 6 finalists: 15 or 100% of winners

So for most categories 3 finalists would just about do. Adding finalists after 3 brings only small gains but 2018 still need 5 finalists to capture all the eventual winners.

Now, obviously, if we added more finalists people’s choices and the voting would change but we can see from the trend that the gains trail off quickly after 3 finalists.

So is five enough? Five clearly works but that’s actually an argument for having six finalists if you want to be confident you’ve got all the plausible contenders. As we definitely got one fifth ranked finalist winning a category (Rebecca Roanhorse in the Campbell Award) there’s maybe a 6% chance of rank 5 finalist winning (one winner out of 15).

Add in the possibility of one finalist being in some way dodgy or have cheated etc then 6 is a safe contingency. Does the same argument not work for 7 or 8 finalists? No, because we can see that the gains trail off rapidly after 3 finalists. Five is probably enough, six is almost certainly enough.

Crunching reform or rollback

There is an on-going discussion at File770 on the 5/6 Hugo nomination rule:

While the Sad and Rabid Puppies slates were filling up most of the slots on the 2015 and 2016 Hugo ballots, majorities at the Worldcon business meetings passed and ratified several rules changes that made it much more difficult for that to keep on happening. The success of these majorities has tended to overshadow how many fans did not want any changes made – no matter how often Vox Day dictated what made the ballot – or else did not want these particularchanges made. And there are business meeting regulars who evidently feel now is the time to start turning back the clock.  
Here’s a matched set of proposals to end the “5 and 6” part of the Hugo nomination reforms. If you are going to the Dublin 2019 business meeting, you will have to decide whether the claims made about convenience and efficiency warrant undoing the protective rules put on the books just a few years ago.

http://file770.com/reform-or-rollback/

The proposal states that:

“The losers will be those who had placed sixth in recent years. There is only one case of a sixth-placed finalist at nominations stage going on to win the Hugo in the last three years (the rather odd situation of Best Fan Artist in 2017, where two finalists were disqualified). On the other hand, a reduced pool of finalists increases the cachet of being among that number.”

I have some doubts about this point. Firstly, 2017 and 2018 isn’t a lot to go on and 2017 still had some residual Rabid Puppy action and hence isn’t a great example for 6th places. We really only have 2018 as ‘regular’ year of the two big voting reforms EPH and 5/6.

I won’t rehash all the arguments from the File770 discussion (at least not yet) but I did want to look at the specific issue of how likely is it that a 6th place nominee might win the Hugo in their category.

Obviously, there are zero examples of this from 2018 but it would be wrong to infer that the answer is therefore zero chance. Instead, I decided to look at how ranks change between the EPH nomination stage and the instant run-off voting (IRV) final stage.

To do that I looked at the nomination rank (EPH) and final rank (IRV) of Hugo and Campbell nominees from 2018. I discarded categories which had declined nominations because I felt they might have weird impacts. Here’s an example of the Novel data:

IRV EPH Dif Mag Finalist Category
1 1 0 0 The Stone Sky Novel
5 2 -3 3 Raven Stratagem Novel
4 3 -1 1 Six Wakes Novel
3 4 1 1 Provenance Novel
2 5 3 3 The Collapsing Empire Novel
6 6 0 0 New York 2140 Novel

In the example: IRV column shows the rank of the work through the elimination process; EPH shows the nomination rank; Dif is EPH minus IRV (negative means the work was less popular in the 2nd stage); Mag is the magnitude of the change regardless of direction.

The average difference has to come to zero (everything balances out) but the (mean) average of the magnitude comes to 1.27 i.e. on average finalists shift about one place from first round to second round. Of the 90 finalists listed 25 had no change, 34 changed by 1 (i.e. the modal change), 19 by 2, 7 by 3, 4 by 4 and only 1 by 5. That last change was a drop from 1 to 6 rather than a rise but does demonstrate the scale of possible change.

How about 6th placers in general? The magnitude of the shift for those ranked 6th in nominations was 1.2 but that was also the average of the difference (i.e. with direction). Of course, if you are in 6th place you can’t get a negative change in your rank in the second stage because you can’t get lower (assuming you don’t get No Award of course and I didn’t model that).

Of the 15 6th placers I looked at, 5 didn’t shift at all, 4 shifted up by 1, 5 shifted by 2 and 1 shifted by 4 (Sheila Williams in Best Editor Short).

I’ll put all the numbers after the fold but I think the figures point to it being unlikely in general that a 6th placer will go on to win in the second round but not so unlikely that we won’t see it every so often.

Data in EPH rank descending order after the fold.

Continue reading “Crunching reform or rollback”

Richard Dawkins saying poorly thought through reactionary things again

Oh dear:

And…

Alternatively we could not do anything like that because it is an appalling idea.

There are at least three levels of confused thinking here. The first is that in the past such attempts to ensure people were sufficiently ‘qualified’ to vote intellectually have been attempts to disenfranchise specific ethnic groups. When coupled with restricted access to education and with the test wittingly and unwittingly full of the biases of the more powerful ethnic group, such tests would be simply a way of creating a kind of apartheid electoral system.

OK, but what if somehow only people who could really understand the issues of the day could vote? Wouldn’t that be better? Isn’t it because of stupid people that we have Trump and Brexit? No or at least not ‘stupid’ as the term is usually used. Voting for Trump or falling for Nigel Farage’s propaganda are certainly daft things to do but a terrible secret of the world is that these are the kinds of ‘stupid’ that otherwise intelligent people do. There are connections between levels of education and political preference but they are neither simple nor straightforward. There is evidence of an ‘educational gradient‘ with how people voted in the UK on Brexit but that gradient does not account for other regional variations (e.g. Scotland). It’s also important to remember that any educational gradient represents people with quite different economic interests as well. Nor was that gradient as smooth as it might sound:

“So, based on the above, the Leave vote was not more popular among the low skilled, but rather among individuals with intermediate levels of education (A-Levels and GSCE high grades), especially when their socio-economic position was perceived to be declining and/or to be stagnant. “

https://blogs.lse.ac.uk/politicsandpolicy/brexit-and-the-squeezed-middle/

Blaming the UK’s current Brexit confusion on stupidity maybe cathartic but it provides zero insight into a way forward. Further it ignores that the architects of the political chaos are products of the reputedly the best education you can get in Britain. Boris Johnson is manifestly a buffoon but he is a buffoon with a good degree in classics from Oxford. The Boris Johnson’s of this world would waltz past Dawkins’s test.

US politics also has a complex relationship with educational attainment. Conservative views peak at mid-ranges of education (e.g. https://www.pewresearch.org/fact-tank/2016/09/15/educational-divide-in-vote-preferences-on-track-to-be-wider-than-in-recent-elections/ ) People with college degrees and more advanced higher education are more likely to vote Democrat currently but in the past (e.g. 1990s) this was less so. The growing (indeed, reversed) education divide doesn’t account for differences among ethnic groups or between genders. Other divides (e.g. urban versus rural) may work causally in the other direction (i.e. different economic demands making decisions about higher education a different choice in rural v urban contexts but the underlying politics resting on other urban v rural differences).

Even if we imagine a Dawkins-dystopia in which you had to have a university degree to vote (a much more substantial hurdle than the demands of either the UK or US citizenship tests) the proposal falls into the political fallacy of technocracy as an alternative to democracy. By ‘fallacy’ I don’t mean that competence or technical understanding or evidence-based policy are bad ideas or things we don’t want to see in government but rather that is a reasoning error to judge democracy in principle as a process by which technically competent policy is formed.

Democracy serves to provide consent from the governed to the government. That’s its purpose. It provides a moral and practical basis on which there can be any kind of government that is even vaguely just. Logically, a vote doesn’t determine whether something is true or not (except in trivial cases on questions about ‘what will people vote for’). Consequently, it is always easy to attack democracy by setting it up AS IF that’s what voting is supposed to achieve. A referendum can’t determine what the smartest course of action is but then that’s not what a referendum or an election is supposed to do. Instead asking people to vote is a way of trying to establish broad social agreement on what a country will do.

Without that kind of broad social agreement a country has only two options: disunity or authoritarianism. Restricting the franchise along any axis will lead to overt authoritarianism. Paternalistic ‘benevolent’ authoritarianism is still a system that depends on brutality.

The shorter version: democracy is about consent of the governed not about how smart voters are. The political divides we currently have wouldn’t be solved by a test that high school graduate would pass. A nation in which only college graduates could vote would be a shitty one and politically unstable. Well educated people can and do advance bad ‘stupid’ political ideas. Come to think of it, there’s a great example here: Richard Dawkins is very well educated and here he is putting forward a stupid idea.

Captain Marvel versus the Trolls

Multiple news sources are covering that the new (and as yet unseen) Captain Marvel movie is being review-bombed by right wing trolls. The amount of coverage of this has itself increased just in the past few hours but this link seems to be one of the first articles on it: https://comicbook.com/marvel/2019/02/19/captain-marvel-rotten-tomatoes-fake-reviews-sabotage/

I’d actually thought about writing about how the alt-right campaign against the film had started to warm up the other day after seeing our old-pal Vox Day jump on the bandwagon (archive link)…but didn’t because I’m lazy and/or got distracted. What I can offer instead of an amazingly insightful prediction that obnoxious misogynists are about to be misogynistic obnoxiously is some graphs!

I grabbed the review data from Rotten Tomatoes so that I can show graphically the influx of reviews. Unfortunately, I would have liked to show another film for comparison but it’s hard to get a like for like. The nearest equivalent with a similar release date and no pre-screening reviews yet is Disney’s live action version of Dumbo. That has only one page of user reviews/comments so far, as opposed to Captain Marvel’s six pages but I don’t think it is a like-for-like in terms of organic interest.

Here’s the first graph for Captain Marvel. It’s a running total of comments over time. It’s a longgggg time axis because the first comment is from 2015! Rotten Tomatoes (and similar sites) create entries for movies that have been announced even before production begins.

Interest (mainly positive but some negative) starts picking up from last July and subsequent trailers lead to more comments (again some positive and some negative). Some of the coverage of this troll attack is focused on the absurdity of people rating films that haven’t been seen yet but at this point, it is technically Rotten Tomatoes allowing people to say whether they are “Not interested” or “Want to see it”. Some of the comments are literally spam and some of the earlier comments are anti-Disney etc.

The next graph zooms in to the last few months:

There’s a spike of comments in February. Obviously some of that is an inevitable increase as the release date gets closer but the more overt hate comments really ramp up. The worst include comments about the lead actress (Brie Larson) being hit by a bus. The length of the comments also increase in the form of what are best called rants:

“Why Marvel decided to cast a very vocal racist and sexist aimed at white males, I’ll never know. If Robert Downey Jr. started saying that he didn’t care about the opinions of 40 year old white chicks and he doesn’t want to be interviewed by a white woman as its not inclusive enough, people would lose their minds. His career would be over, branded a racist and sexist, attacked in the media and his legacy tarnished. As a white male, I will not be supporting this or any other movie that stars Brie Larson. They say that Captain Marvel will be the new face of the MCU? As the villain because she certainly isn’t a her-o. “

How many is it though? Well, one comment anticipating somebody dying in a bus accident is one too many but for a sense of scale it’s about 14 comments over the past 10 days that are of the ‘arrghh SJWs! Feminazi!’ style crap. It’s not a huge number and the spike shown above is inflated by other people querying why there are so many anti comments for a film nobody has seen yet.

It’s a reasonable assumption that this is just the start though.