Category: Statistics

Even More Hugo Wisdening

I’ve never been a fan of cricket but my family growing up were and there were numerous copies of Wisden in the house, which for those who don’t know of it is best described here https://en.wikipedia.org/wiki/Wisden_Cricketers%27_Almanack I guess some in the house hoped that I might find it intriguing and I could see the appeal but resisted.

These days we’ve got something better! All the fun of tables of dry numbers PLUS science fiction books! I don’t have a round up of other takes on the numbers yet though.

Normally Brandon Kempner at Chaos Horizon has posted something by now but there’s not been a post there since February. I hope he is OK.

Greg Hullender of Rocket Stack Rank is actually in Helsinki – and having a fun time I hope – so probably won’t post anything yet.

In the comments JJ gave links to three rich sources of data:

The first one is great for seeing EPH in action.

Continue reading

The Black SFF Writer Survey Report

This is an interesting read http://www.fiyahlitmag.com/bsfreport/ from FIYAH Literary Magazine. I’ll let the report speak for itself and I’m still digesting it but I’d like to pick up a point they make in the introduction:

“A final note: We know that some usual suspects will attempt to invalidate what we’ve captured by claiming that our analysis lacks rigor, or our methodology was faulty. This is a smokescreen that these individuals use to hide the fact that they are against making the speculative fiction publishing space inclusive and respectful to black writers–all writers, really–and their work. Using assumed (and faulty) scientific expertise to attack the experiences of marginalized people is not a new tactic, and one that is frequently used by these groups in an attempt to maintain the oppressive systems that they believe should solely benefit them. They will never admit that fact so we are making it plain here.”

Strongly worded but a reasonable response given some of the muddleheaded reactions we saw to the Fireside report.

This is not to say that the report is somehow methodologically perfect or has flawless data or answers all question. Rather, the point is that gathering a complete data picture of an area of study takes time, multiple studies and necessarily is an iterative process of collecting incomplete data which then inform new surveys and new studies. There is a bootstrap element to all statistical study e.g. how do you know whether your sample is representative without first having statistical data about the population you are sampling, which you can’t get without doing a representative sample of the population your want to sample? The answer is that *perfection* is unobtainable but *good-enough* is both obtainable and part of an iterative process of gaining knowledge.

So does the report have limitations? Yes, obviously – the writers aren’t omniscient.  The question is does it improve our understanding?

Survey results! Freeped by squirrels

surveymonkey

After 77 votes, some of which were rigged, the surprise result was “Maybe its is squirrels who do all the real work around here. Just saying” – which isn’t even grammatically correct and wasn’t even an option initially.

Freeped by squirrels.

Again.

[Also: nice graph option there from Survey Monkey. The proportionally divided bar graph is a nice alternative to the pie-chart and is arguably easier to read.]

Margins of error

I suspect most people who read this blog know all this already but I’ve met the same misunderstanding at work recently and also in the context of the opinion polls around the POTUS election. So here is a simplified explanation.

Imagine I have a great big jar of jelly beans, which are the favoured confectionary of probability explanations. There are exactly 500 red jelly beans and 500 blue jelly beans and nothing else – no Jill Stien jelly beans or exotic Even McMulberry flavours. A jelly bean pollster doesn’t know this, though. The pollster wants to estimate the proportion of red and blue jelly beans in the jar BUT is only allowed to look at some of the jelly beans.

The pollster grabs a handful of jelly beans from the jar and looks at the relative proportion of jelly beans. Naturally, I don’t want the pollster to do this very often because they’ll put their germ-ridden hands all over my beautiful jelly beans. So pollster only has this handful to look at. They have to make a key assumption – that the jelly beans were well mixed so that their handful is a random pick of jelly beans in the jar.

The pollster looks at the proportion of red to blue jelly beans. Let’s say they have 5 red and 8 blue jelly beans. The pollster says that the proportion of red to blue is 38% to 62% BUT they also report a margin of error that is quite large. They can’t be sure this figure is right because they know they may have been unlucky. With only 13 jelly beans in their handful, it isn’t wholly impossible that they could pick out nothing but blue jelly beans if the true proportion was 50-50. Now note if they did pick out nothing but blue, this could happen by chance.

Margins of error address only this aspect of errors in polling – that the proportion in the sample was to some extent an ‘unlucky’ pick. Both the reported figure and the margin of error BOTH assume that the picking was done correctly. In our jelly bean example the assumption that the beans were well mixed together.

Now it so happens that I didn’t mix the jelly beans well (although the pollster can’t tell)*. There are actually MORE red towards the top and fewer red towards the bottom of the jar. So the pollster’s assumption was wrong. A clever pollster might try to find ways to deal with this methodologically (e.g. by grabbing beans from both the top and the bottom) but the principle still applies: the reported estimate and the margin of error assume that the sampling methodology was valid. The margin of error doesn’t (and can’t) account for the probability of what in common parlance would be called an ‘error’ (i.e. a mistake).

The Right’s War on Statistics

‘Zero Hedge’ is in a flap about poll ‘oversampling’ here http://www.zerohedge.com/news/2016-10-23/new-podesta-email-exposes-dem-playbook-rigging-polls-through-oversamples

Gasp!

It even includes a hack email from John Podesta which discusses ways of ensuring that the Democrats own polling over samples minority groups. Again, gasp!

Except. Well over sampling a smaller demographic group is the right thing to do. When I say ‘right’, I don’t mean for opinion polls but for collecting statistics on a population in general.

Say you have a representative sample of a population consisting of a thousand people. Now, of that thousand people you are particularly interested in a sub group that represents 1% of the population. If your sample is exactly proportionate, then it should have 10 people belonging to that sub group. Unfortunately 10 is a shitty sample, if you are unlucky to get 2 odd people with unusual views they then form 20% of your sub-sample.

Sample size is a dark art but the easiest issue to understand is it that magnitude matters. A good sample size is less about the proportion of the whole population in your sample and more about the raw number of people. More is better, but ‘more’ is subject to diminishing returns.

Over sampling means you can get a better picture of the sub group. However, because you end up with more of group X than you should have, their response are then weighted proportionally when looked at the statistics overall.

Are polls manipulated! Well, if by ‘manipulated’ you mean ‘use statistics’ then yes.

The EPH Analysis

An analysis of proposed new Hugo voting rules is out. It’s disappointing to some but I think it validates the change to EPH.

The story so far:

In response to the Sad Puppy/Rabid Puppy slate of the 2015 Hugo Awards, a voting system called E Pluribus Hugo was proposed and passed at the 2015 Worldcon Business meeting. The system used a process of weightings and elimination rounds to make the nomination process have more proportionality without changing the basic mechanics of how people nominate things.

Much thought and tinkering was put into EPH but what it lacked was real data. EPH should make the list of finalists more proportional to the underlying groupings of voters. However, that meant that the impact of EPH couldn’t really be known without knowing to what extent Hugo voters clustered around choices anyway. Without slates, do Hugo voters form natural groupings (perhaps along sub-genres or sub-fandoms) or are they just a noisy mess of stuff? Without real data there is no way of knowing.

While EPH was passed at the 2015 Worldcon Business meeting, it requires ratification this year to come into effect. As part of that process an analysis of the 2015 and 2014 nomination ballots has been done and the results are just out…

What it all means…

I don’t know. No, that isn’t a useful reaction. OK, I’ll try again.

Below is a list of possible talking points, reactions, counter-argument things. I made them up. They don’t necessarily reflect actual people’s views (I’ll say when it does). Bold represent a possible reaction (not mine) and not bold is my response.

I’m also a hostage to fortune because more results are coming – post the Hugo ceremony, data on the 2016 nominations will come out and who knows what that will show.

For a different take try Nicholas Whyte http://nwhyte.livejournal.com/2707679.html?utm_source=twsharing&utm_medium=social

2015 results show that EPH doesn’t fix the slate problem!

No one thing can fix that problem. However, in most categories, at least one additional non-slated works made it onto the ballot with EPH. That means, probably, instead of No Award winning several categories in 2015, a worthy finalist would have won instead.

EPH+No Award together produce a strong disincentive to puppy-style slates. Slate voting will produce legitimate votes and so is bound to have some impact. The combination of EPH and No Award means that a slate will find it hard to sweep a category and win a Hugo.

EPH doesn’t stop those slate-inclined who just want to get to be a finalist and don’t care about winning!

True, but that was a given. Get enough votes and you get to be a finalist. EPH does demonstrably reduce the chance of that succeeding for a slate of nominees but it doesn’t do anything about a single nominee. Again, get enough votes and you get to be a finalist. The only guaranteed way of stopping that is to create a wholly different kind of award.

There is a non-puppy related change in 2015 Best Graphic Story!

That is interesting. With EPH instead of Sex Criminal 1 getting nominated Schlock Mercenary gets to be a finalist.

Sex Criminals got 60 noms in total and Schlock Mercenary got 51. However, Sex Criminals must have been more clustered with other nominees (such as Saga?) and hence lost out a bit to Schlock Mercenary.

With only one slate nominee, this was an interesting category. I liked Sex Criminals, but I think this is a positive demonstration of EPH. It should result in more variety of nominees without slates.

They didn’t include Best Dramatic Presentation!

The reason the report gives is this:

In testing, it was identified that the results in two categories (Dramatic Presentation, Long and Short Form) were usually producing results with many nominators submitting matching entries to other nominators. This was more due to the smaller pool to nominate from compared to other categories than any external coordination of nominating ballots. As such, we decided to produce results with these categories excluded, as changes in the dramatic presentation categories aren’t as useful for gauging if EPH is acting as appropriate where desired as the other categories would be.

That seems silly to me. There are lots of reasons to expect more organic coordination of ballots in these categories, and seeing how EPH works in that circumstance is useful as a way of comparison.

I hope they change their minds at some point.

A single coordinated minority of less than 20% would still average controlling over 80% of the ballot!

Aside from the exclamation mark, that is a direct quote from the report. This appears to be true but controlling only 80% of the ballot is enough to kill Puppy-style slates without having No Award win multiple categories.

Killing the incentive to use the 2015 Puppy slate tactic is what EPH needs to do. It will do that.

EPH+ would be better!

Probably yes, but I don’t know if other side effects (see below) would be worse.

2015 Puppy-style slates are last year’s problem. EPH doesn’t deal with THIS year’s problem!

True. However, the structural weakness of the Hugo voting system exists regardless and the cat is out of the bag. Others can try to game the Hugo Awards in the same way and perhaps more covertly.

As for the griefing style tactics of Vox Day, I think that needs a qualitatively different approach but that is an argument for another day.

EPH knocks out a potential winner in 2014!

There are few changes to finalists with the 2014 data. I think that confirms that without slates EPH will tend to deliver similar results as the current system. However, what isn’t guaranteed is that the results must be exactly the same.

In 2014 three results are notable.

  • Firstly Best Editor Short has a swap of finalists in the last spot – Sheila Wiliams (86) swaps with Bryan Thomas Schmidt (80).
  • Pro artists also has some swaps, partly because in 2014 a tie for fifth place meant 6 nominees. Essentially four artists with 50, 49, 49 and 48 nominations end up with a different ordering with EPH. The EPH ranking ends up as 49, 48, 49, 50 and that looks fine to me because I have that special kind of innumeracy that results from being overly numerate.
  • Fancast has the most understandable change but also the most problematic. In 2014 this was a three-way tie for last finalist at 35 nominations each. EPH breaks the tie and resolves the issue with a single nominee. Unfortunately, one of those three (SF Signal Podcast) won and would have been eliminated by EPH.

The thing is these are all pretty much very close votes with smaller numbers of voters. Anything different about 2014 would probably have resulted in different outcomes. For the Fancast result, an internet outage or a sick cat could have ended up with a different result. The least error in collating the data could have ended up with different results.

Put another way: Hugo voters did not have a clear consensus of which of these people/works should have been nominated. These cases are not good arguments against EPH.

Yes, but, but EPH+ might make that problem worse!

I’ve really no idea. I guess it might broaden what we might think of as a marginal tie and lead to more notable discrepancies between the number of nominating ballots and grabbing that last spot in the finalists. I don’t know.

The current system doesn’t avoid this issue, it really just hides it. For some categories, there are finalists who we really can’t say are substantially more nominated than others. The differences are small enough to be down to happenstance. And yes, some of those may actually end up being winners.

I think the answer is the number of nominees needs to be more flexible than just 5. However, deciding the rules on when to expand the number of nominees beyond an exact tie is unclear.

Where nominator coordination is not present, there are still significant numbers of changes not only to the long list, but to ballots where it’s not generally considered for anything untoward to have happened. Items removed from the 2014 ballot included a
winner of the Hugo. Had EPH been in place, they would not have been on the ballot.

That is a direct quote from Dave McCarty’s conclusion on the report. Sorry, but that is a flawed counterfactual. If we could somehow rewind the tape back to early 2014 and re-run the 2014 nomination ballot again, how likely is it that we’d have ended up with that exact tie that occurred? EPH changed the result because it broke a tie and the other places where there were changes were also spots with very close votes.

Almost ANY change would have meant that something slightly different would have happened! For SF Signal not have been a finalist required ONE nominator’s vote to be different

The changes to the Ballot and Long list are not easily verified and for people reviewing the detailed results at the end the only way to check that the process is working correctly would require access to secret nomination data and significant time.

That’s Dave McCarty again. Well, ANY verification of results needs access to ballots. Given Dave McC is worried about the 2014 Fancast result shifting by possibly one vote, to verify the CURRENT process would require checking that ballots had been classified correctly and counted correctly.

Assuming the underlying ballot data is correct (i.e. everybody’s nominations have been correctly collated) and in a machine readable form (e.g. a text file or spreadsheet), the EPH check takes seconds. Don’t trust the EPH program you are using? Use a different one and see if you get the same results. EPH is not hard to code, I made an Excel version that only uses standard Excel formulas and NO extra code at all.

So, yes, cleaning the nomination data and getting it all tickety-boo takes time – without a doubt BUT if we wanted to verify that the results DON’T CHANGE under the CURRENT process YOU WOULD STILL NEED TO DO THAT.

 

The Puppy Axis Returns: Part 2 – Fireside and making sense of it all

In my earlier post, I remarked on how the Fireside report on the underrepresentation of black authors in published SFF short fiction generated an unusual degree of agreement among four major Sad/Rab Puppy protagonists, Larry, Brad, John C Wright and The Dumpster Fire who Walks Like a Man*.

In this post, I want to talk more about the Fireside report, its methodology and flaws and then look at Larry Correia’s “fisk”. I’ll focus on Larry because Brad Torgersen’s blog post is mainly rambling around the issues, while John C Wright and Vox are more open about the source of the animus.

First to the Fireside report. As they say right off:

The methodology is flawed, as it’s based in self-reported data whenever possible, but such data was not always findable or clear.

They also point out:

…we don’t have access to submission-rate data concerning race and ethnicity either overall or by individual magazine…

Other issues/objections that could be raised is national variation. For example, Andromeda Spaceways Inflight Magazine is one of the ‘zines included. It is an Australian magazine with (I think) mainly Australian contributors. Different country, different dynamics of race, ethnicity and self-identification, and different population proportions. [Note: that isn’t meant as a justification for the ‘zine having zero in the study, it is purely an observation of the difficulties Fireside faced in collecting this data].

Additionally, caution needs to be applied at a ‘zine level. For a ‘zine with fewer stories in a year, a single story by one black author would make the difference between zero representation and a reasonable proportion (assuming a 13% black population).

What is notable, is the report is up-front about the issues in their approach and they don’t attempt to hide that there is a substantial degree of uncertainty around the findings. They aren’t claiming some indisputable proof but they are pointing out an obvious red-flag that people should pay attention to.

Having said all of that: zoiks! The resulting number of stories published by black authors across this broad spread of magazines is very low. For interest I tagged ‘zines in the Fireside data that were in the Semiprozine directory (n=20). The proportion of stories by black authors works out much the same as for the total – about 1.9%.

Now maybe getting better data of author self-identification might result in a different picture and the study can’t tell us any specific “why’ of the under representation. Yet we can speculate. A good study (and I think this one is good) is not neccesarily a flawless one but rather one that helps us generate new hypothesis which allows us to find better data. For example we can now ask about some of the “why” behind the results:

  • Is it stories not being accepted? If so, why?
  • Is it stories not being submitted? If so, why aren’t they?
  • Is it the study looking in the wrong places? if so where should it have looked?
  • Is it all the various methodological errors all creating a misleading bias in the data? If so, how come? And does that really seem likely?
  • Is it just that there are lots of black authors being published but the author’s ethnicity isn’t particularly visible?

We also have my favourite Franciscan monk to help us out: William of Okham. He gently reminds us not to over complicate our hypothesis. We have, as a given, a know institutionalised bias against black people in Western societies that has existed for a very long time and which exists both as overt racism and as more subtle forms of discrimination. Finding a group which has been historically under-representated is currently under-representated does not require elaborate explanations. That doesn’t mean we all declare the case closed and never look for better data, it just means that we already have a highly plausible explanation that fits very well with known facts.

And a study is not just about discovering facts and forming hypotheses. What this should inform is what action we should take. When considering that it is worth considering what the downsides of an action will be. Let’s have a look at what ‘zine editors can do in response:

  • Actively try and publish more stories by black authors.

And the down side of that response is:

  • Some extra effort expended but otherwise no obvious down side.

Now what I can’t help noticing is that of the various questions we could ask of the data to get better data, none of them really impact much on that basic response. Looking beyond that response, for example into how the SFF community can help foster talent in diverse communities would be helped with better data but again, we don’t ACTUALLY need better data to make a better start. So I’ll add a more strategic response to this report:

  • Actively try and foster SFF talent and writing in diverse communities.

And the down side of that is:

  • None.

The upside EVEN IF THE WHOLE FIRESIDE REPORT WAS WRONG would be:

  • More good SFF writing and more SFF fans.

PS. I was going to get into Larry’s fisk in this post but time has moved on, so I’ll save that for a part 3.

*[Copyright: Philip Sandifer]