ngramming across the universe

…only going forward because we cannot find reverse.

I’ve gone a bit n-gram crazy today – as can happen. I thought I could test the question of whether the Hugo Awards have lost relevance or importance by graphing the term “Hugo Award” over time with the n-gram viewer. Now there is a little extra trick you can do which is to graph a term in one corpus and in another at the same time by using special syntax. I’m going to use the English 2012 corpus (books written in English digitized up to 2012 – although it only shows up to 2008) but also the English Fiction 2012 corpus (fiction books in English).

The syntax is Hugo Award:eng_2012,Hugo Award:eng_fiction_2012


Which is a graph with some sort of story behind it.

A couple of things. The data is normalized over time, so figures represent percentages of text from that year. As the English 2012 is a big corpus the term “Hugo Award” will be a smaller percentage in the bigger corpus than in just the fiction.

However what the graph does show is the term “Hugo Award” has a general upward trend when looked at in books in general but quite a different pattern when looked at purely fiction.

Why? Well I personally have no idea but I assume it relates to the extent to which published fiction (including anthologies and magazines) will have included the term “Hugo Award”. That seems to have peaked from 1978 to 1982.

The Seven Cardinal Sins – a puppy summary

I haven’t reviewed everything that was nominated but I have read everything and read multiple reviews. I thought this was a good time to look retrospectively of what was wrong with the nominated works (not including best dramatic categories, editorial categories, fan categories or artists).

The Catholic catechism traditionally identifies seven capital sins: pride, avarice (greed), envy, wrath, lust, gluttony, and sloth (or acedia i.e. neglect).

In terms of the Puppy campaigns these traditional sins aren’t a great match in their entirety. Lust in particular doesn’t make much of an appearance – if anything the overall attitude to sex has been almost puritanical. The one I would pick out is sloth. I think there is some obvious evidence of laziness in the compilation of the slates. They appear rushed and contain obvious omissions – Soft Causality by Michael Z Williamson, Heinlein’s biography, The Three Body Problem (which became a top pick of Vox Day leader of the Rabid Puppies). Traditionally this sin also includes acedia a sin that covers many modern issues including things we would not regard as a vice (such as depression) but also things we still do such a neglect or mindless compliance.

Lazy curating of the slate, mindless compliance with lock step voting and email campaigns, neglectful edits, and a general unwillingness to explain, review or persuade. Sloth and avarice seems to be the cardinal sin of the Puppy campaigns (no, that doesn’t mean I’m saying everybody who has ever read a Puppy nominated author is greedy and lazy).

With that in mind what are the major ‘sins’ of the nominated works?

  • Poor editing
  • Lack of cohesion
  • …parts of incomplete works…
  • Appearance by virtue of knowing Brad Torgersen
  • Shown up by substantially superior works in the same category
  • Stories of over-blown self importance
  • Irrelevance

Poor editing: all of the John C Wright nominations but in particular One Bright Star. The Science is Never Settled needed substantial re-working for it to be a decent (i.e. 18 year old at school) essay.

Lack of cohesion: Again John C Wright’s Transhuman and Subhuman, Roberts’s The Science is Never Settled, Championship B’Tok and Big Boys Don’t Cry all tended to wander off topic and lacked clarity.

…parts of incomplete works…: With a current lack of a ‘saga’ category perhaps Skin Game can be forgiven but Flow and Journey Man in the Stone House and Championship B’Tok all had a fragmentary feel of a an extracted chapter from a novel. None stood alone well.

Appearance by virtue of knowing Brad Torgersen: Numerous works but most obviously Wisdom from My Internet. I don’t know if Brad T is friends with the person who makes Zombie Nation but there is no obvious reason why it was nominated.

Shown up by substantially superior works in the same category: Zombie Nation was the only puppy nominee for Best Graphic Story among a set of commercial and critically successful other nominees that showed depth and talent.

Stories of over-blown self importance: Turncoat, Parliament of Beast and Birds, One Bright Star. Pompous pomposity.

Irrelevance: Best related work included a collection of unfunny Facebook offensiveness and a half baked essay on the nature of science – neither had any more than a tenuous relation to SF/F. Parliament of Beast and Birds was a religious fable – but that arguably scrapes into fantasy

Excel Pluribus Hugo

[Note: I’m very much not an expert on this proposal – this was the easiest way for me to make sense of it at a practical level. I may have well misunderstood aspects of the process]

Yay! I think I have finally removed all the kinks from my Excel version of the proposed new nomination system for the Hugo awards known as “E Pluribus Hugo” or EPH (and of course look here and interesting comments here).

The proposal basically boils down to people nominating what they like in a simple manner i.e. submit five (or fewer) things you want nominated in a category but then adding a more complex way of tallying the votes. Each work you nominated is weighted by the number of things you nominated – so if you nominate 5 things each one is getting 1/5 (0.2) of a vote from you, nominate four things and each thing gets 0.25 of a vote from you.

Nominated works then go through an elimination process. First they would find the two nominees with the lowest weighted score, then they would compare the total number of raw nominations (not weighted) each of those two works got. The one with the fewest raw nominations is eliminated. The clever bit is that with that work gone, anybody who nominated it has now got a slightly more strongly weighted vote for everything else they nominated.

In theory it would mean a slate or block vote would improve their success of getting one work on the final ballot but drastically reduce the chance of them getting multiple works on the ballot.

Now I wanted to model it in a transparent way – hence a spreadsheet rather than a program. With an Excel spreadsheet you see a static snapshot of all the stages in one go. I didn’t want to us any VBA or Pivot Tables either and I wanted to set up a complete round so that I could then copy and paste one round after another to make a complete process without editing formulas as I went. Continue reading “Excel Pluribus Hugo”


So I’ve been down a few dead-ends of late in my number crunching past Hugo winners.

I’ve looked for obvious signs of bias and also for cliques and found not much to write home about. The last two issues are whether the Hugo Awards (or other awards such as the Nebulas) have gone to unworthy winners or alternatively, to works that are too literary. This is something of a heads-you-win-tails-I-lose proposition, as demonstrating worthiness would tend to involve showing independent recognition of a writer’s skill beyond SF/F awards – thus proving the too literary complaint.

Either way I have been looking for a way into this so that there is actual evidence to discuss. So far not much luck.

One promising lead was the Google N-Gram viewer. When Google digitized huge numbers of books they gained a massive corpus of texts that allow for systematic analysis. One kind of analysis is a count of n-grams i.e. a ordered set of characters. As the Google book metadata includes the year of publication that allows for trends in topics to be graphed. For example this graph shows trends for William Gibson’s 1989 Hugo nominee Mona Lisa Overdrive.

mldrive See it properly here.

Continue reading “Ngramming”

Why Science is Never Settled – a review of part two of the essay

My review of Part 1 of Why Science is Never Settled by Ted Roberts can be found here.

Part 2 Is focused more on the fallibility of science. Like Part 1 it lacks focus or connection with a unifying argument. In some ways it acts more like an appendix to Part 1, with a look at various different issues in depth.

Part 2 is split into several major sections:

  1. Scientists are human, too – which looks at human failings and spits in science but primarily concentrates on the pressure on academic scientists to ‘Publish or Perish’
  2. Lies, Damned Lies, and Statistics! – which looks at statistical analysis and failures by scientists when conducting statistical tests.
  3. The vaccine controversy – a case study on the Andrew Wakefield affair (which I discussed in my first review)
  4. The problems of peer-review – a look at issues with peer review with links to some notable cases.
  5. It’s a process, not a conclusion – which ironically acts as a sort of conclusion to the whole essay but oddly isn’t the last section.
  6. Internet memes and the love of science. – which is basically just some complaints about the Facebook site (the title of the site isn’t “feaking”). You can safely skip that bit.

I’ll go through the sections in turn to varying degrees of detail. Continue reading “Why Science is Never Settled – a review of part two of the essay”