Are the Hugos getting more clique ridden – follow up 1

This is a follow up to my previous look at whether the Hugos are more clique ridden by trying to use data on nominees with multiple nominations. Does that make sense? If not read the last episode or take a deep breath as it gets worse from here. Note these posts are all ‘thinking out loud’ – comments and critiques are more than welcome. Some useful comments from Yellowcake and Influxus from the last post on this topic will feed into the next follow up.

A counter argument I was considering was that while the multiple nomination issue didn’t seem to be getting worse according to my analysis, PERHAPS I was ignoring a key issue. Sure some people (e.g. Heinlein) got a lot of nominations but this was over a long career. The problem is people like [insert hate figure] who is a relative new comer but has already racked up X nominations.stdevbymean

Hmm. How to test that? Well here is my plan. Get the same data I used before – number of nominations, average (mean) year of nomination but now add the standard deviation of the year of nomination! Nominees with long careers will have

a bigger standard deviation. Nominees with shorter careers will be nominated over a shorter time period and hence a small standard deviatio

n. Standard deviation in this case is also measured in years.

But…lots of people have only 1 or 2 nominations. Consequently the data will be swam

ped by the people with few nominations. Solution! Filter out the people with less than 3 nominations. Now I’m accounting for all three aspects, time period, nomination period, 3 or more nominations.

Computer! Compute!


7 thoughts on “Are the Hugos getting more clique ridden – follow up 1

  1. Two problems:
    – Your endpoint in the 2010 year range has a high change of having a low stdev because it will contain people who were not around in the 80s and thus have little chance of getting a large stdev. Authors with long careers who were recently nominated might give you a high stdev in the 2010s in theory, in practice they won’t because their mean will not be in the 2010s.
    – There is a similar issue at the other end, in that the Hugo’s were not there at the start and some people did their best work before Hugos were invented. So again, you have some people whose careers ended in the 60s, who will be disproprtionately represneted in the 50 to 60 period and have low stdevs.
    – A Straight line between two points close to x-Axis (a) has no signficant slope and (b) will have poor R^2 because conditions in the middle of the observation period are different.
    – More statistically speaking , your data points are heteroscedastic for reasons inherent in your methodology and hence – again- your Fishers z-test doesn’t work.

    Problem two:
    – Not all stdevs are equal. Outlier nominations raise the stdev disproportionately and shift the mean to a year for which the author in question likely had no nominations at all. At which point it becomes obviously nonsensical for the author to be represented by that mean. You’d need to differentiate “long career” from “outlier nomination” somehow. (This then brings us back to the problem of measuring “career” in terms of Hugo nominations. There’s a substantial difference between someone who wrote 3 great books and got on the ballot 3 times and someone who wrote good books every year and frequently made the 6th to 10th place in nominations but only made the ballot in weak years.)[This then brings us to the problem of treating authors as Nutty Nuggets: Some only produce stellar work once, some are overlooked, and a single person who is popular with the Hugo voters (say Mike Resnick) and keeps producing good work throws the whole sample into disarray.]

    Like

  2. I considered the issue of the endpoint but that should make it appear as if the low stdev was more likely in later years (i.e. make it seem more likely that recent years had people with lost of nominations close together). So I was making it easier to find that there was a problem (which there isn’t).
    At the other end I forgot to mention that I started in 1960.

    The Fisher z-test was what I had to hand.

    Like

    1. I have a rather pragmatic approach to statistics, but IMO you are at the point where the Fisher test obscures rather than clarifies. You can’t know whether it’s trustworthy anymore, so the p-value tells you nothing.
      My suggestion would be to use a moving average filter or loess regression lines and see whether there is a linear trend there. My guess is that you’ll find an inverted u-shaped relationship. Which you could then test for with a regression with a quadratic term. However, at the end of that you need to be absolutely certain that your endpoints are not biased by your methodology and per above I don’t think you can claim that.

      Like

  3. Outlier nominations – considered but haven’t fully investigated. I classified each nomination by half decade (e.g. a nomination in 1962 is the 1960 half decade, a win in 1967 is the 65 half decade and so on) so I could look for people with odd patterns – Philip K Dick for example with only 2 nominations over 10 years apart. Still “1969” works out as goodish representation of the middlish of when he was in “play” for a Hugo

    Like

  4. Yes – can’t deal with author quality or author luck (good or bad). By author luck, I mean what a given work had to compete with on a given year. If you are lucky the field is bad that years just because of what got published, eligibility etc.

    Overall quality and identifying nominees who got more nominations than they should have (based on the quality of their work) – I’d need a figure for quality. Had a sort of plan to use Google n-Grams to quantify essentially how written about a work was X years after winning, but there is an upper limit on the number of words in a unit it will check on (cutting out some long titles) and short titles like “Dune” will obviously throw up things other than books. I could search on authot but hard to say what that gives me.

    Like

Comments are closed.