ngramming across the universe

…only going forward because we cannot find reverse.

I’ve gone a bit n-gram crazy today – as can happen. I thought I could test the question of whether the Hugo Awards have lost relevance or importance by graphing the term “Hugo Award” over time with the n-gram viewer. Now there is a little extra trick you can do which is to graph a term in one corpus and in another at the same time by using special syntax. I’m going to use the English 2012 corpus (books written in English digitized up to 2012 – although it only shows up to 2008) but also the English Fiction 2012 corpus (fiction books in English).

The syntax is Hugo Award:eng_2012,Hugo Award:eng_fiction_2012


Which is a graph with some sort of story behind it.

A couple of things. The data is normalized over time, so figures represent percentages of text from that year. As the English 2012 is a big corpus the term “Hugo Award” will be a smaller percentage in the bigger corpus than in just the fiction.

However what the graph does show is the term “Hugo Award” has a general upward trend when looked at in books in general but quite a different pattern when looked at purely fiction.

Why? Well I personally have no idea but I assume it relates to the extent to which published fiction (including anthologies and magazines) will have included the term “Hugo Award”. That seems to have peaked from 1978 to 1982.

The Seven Cardinal Sins – a puppy summary

I haven’t reviewed everything that was nominated but I have read everything and read multiple reviews. I thought this was a good time to look retrospectively of what was wrong with the nominated works (not including best dramatic categories, editorial categories, fan categories or artists).

The Catholic catechism traditionally identifies seven capital sins: pride, avarice (greed), envy, wrath, lust, gluttony, and sloth (or acedia i.e. neglect).

In terms of the Puppy campaigns these traditional sins aren’t a great match in their entirety. Lust in particular doesn’t make much of an appearance – if anything the overall attitude to sex has been almost puritanical. The one I would pick out is sloth. I think there is some obvious evidence of laziness in the compilation of the slates. They appear rushed and contain obvious omissions – Soft Causality by Michael Z Williamson, Heinlein’s biography, The Three Body Problem (which became a top pick of Vox Day leader of the Rabid Puppies). Traditionally this sin also includes acedia a sin that covers many modern issues including things we would not regard as a vice (such as depression) but also things we still do such a neglect or mindless compliance.

Lazy curating of the slate, mindless compliance with lock step voting and email campaigns, neglectful edits, and a general unwillingness to explain, review or persuade. Sloth and avarice seems to be the cardinal sin of the Puppy campaigns (no, that doesn’t mean I’m saying everybody who has ever read a Puppy nominated author is greedy and lazy).

With that in mind what are the major ‘sins’ of the nominated works?

  • Poor editing
  • Lack of cohesion
  • …parts of incomplete works…
  • Appearance by virtue of knowing Brad Torgersen
  • Shown up by substantially superior works in the same category
  • Stories of over-blown self importance
  • Irrelevance

Poor editing: all of the John C Wright nominations but in particular One Bright Star. The Science is Never Settled needed substantial re-working for it to be a decent (i.e. 18 year old at school) essay.

Lack of cohesion: Again John C Wright’s Transhuman and Subhuman, Roberts’s The Science is Never Settled, Championship B’Tok and Big Boys Don’t Cry all tended to wander off topic and lacked clarity.

…parts of incomplete works…: With a current lack of a ‘saga’ category perhaps Skin Game can be forgiven but Flow and Journey Man in the Stone House and Championship B’Tok all had a fragmentary feel of a an extracted chapter from a novel. None stood alone well.

Appearance by virtue of knowing Brad Torgersen: Numerous works but most obviously Wisdom from My Internet. I don’t know if Brad T is friends with the person who makes Zombie Nation but there is no obvious reason why it was nominated.

Shown up by substantially superior works in the same category: Zombie Nation was the only puppy nominee for Best Graphic Story among a set of commercial and critically successful other nominees that showed depth and talent.

Stories of over-blown self importance: Turncoat, Parliament of Beast and Birds, One Bright Star. Pompous pomposity.

Irrelevance: Best related work included a collection of unfunny Facebook offensiveness and a half baked essay on the nature of science – neither had any more than a tenuous relation to SF/F. Parliament of Beast and Birds was a religious fable – but that arguably scrapes into fantasy

On n-grams and Corpus America

The background is too hard to explain but somebody had to do this and I believe the task must lie with me.

Graphs of the n-grams of ‘Steve’ and ‘Stanley’ in the general Google Corpus 1800 to 2000


And using the American English corpus.


Excel Pluribus Hugo

[Note: I’m very much not an expert on this proposal – this was the easiest way for me to make sense of it at a practical level. I may have well misunderstood aspects of the process]

Yay! I think I have finally removed all the kinks from my Excel version of the proposed new nomination system for the Hugo awards known as “E Pluribus Hugo” or EPH (and of course look here and interesting comments here).

The proposal basically boils down to people nominating what they like in a simple manner i.e. submit five (or fewer) things you want nominated in a category but then adding a more complex way of tallying the votes. Each work you nominated is weighted by the number of things you nominated – so if you nominate 5 things each one is getting 1/5 (0.2) of a vote from you, nominate four things and each thing gets 0.25 of a vote from you.

Nominated works then go through an elimination process. First they would find the two nominees with the lowest weighted score, then they would compare the total number of raw nominations (not weighted) each of those two works got. The one with the fewest raw nominations is eliminated. The clever bit is that with that work gone, anybody who nominated it has now got a slightly more strongly weighted vote for everything else they nominated.

In theory it would mean a slate or block vote would improve their success of getting one work on the final ballot but drastically reduce the chance of them getting multiple works on the ballot.

Now I wanted to model it in a transparent way – hence a spreadsheet rather than a program. With an Excel spreadsheet you see a static snapshot of all the stages in one go. I didn’t want to us any VBA or Pivot Tables either and I wanted to set up a complete round so that I could then copy and paste one round after another to make a complete process without editing formulas as I went. Continue reading “Excel Pluribus Hugo”


So I’ve been down a few dead-ends of late in my number crunching past Hugo winners.

I’ve looked for obvious signs of bias and also for cliques and found not much to write home about. The last two issues are whether the Hugo Awards (or other awards such as the Nebulas) have gone to unworthy winners or alternatively, to works that are too literary. This is something of a heads-you-win-tails-I-lose proposition, as demonstrating worthiness would tend to involve showing independent recognition of a writer’s skill beyond SF/F awards – thus proving the too literary complaint.

Either way I have been looking for a way into this so that there is actual evidence to discuss. So far not much luck.

One promising lead was the Google N-Gram viewer. When Google digitized huge numbers of books they gained a massive corpus of texts that allow for systematic analysis. One kind of analysis is a count of n-grams i.e. a ordered set of characters. As the Google book metadata includes the year of publication that allows for trends in topics to be graphed. For example this graph shows trends for William Gibson’s 1989 Hugo nominee Mona Lisa Overdrive.

mldrive See it properly here.

Continue reading “Ngramming”

More Rabid nastiness

For other reasons I visited Vox Day’s blog today (the leader of the Rabid Puppies campaign). It is never a pleasant experience but it does help illustrate how particularly unpleasant this ‘side’ of the Hugo kerfuffle is.

Kary English was nominated by both the Rabid Puppy and the Sad Puppy campaigns for best short story. Surprisingly, despite the overall poor quality of the Puppy nominees, many non-Puppies who have read the story have quite liked it. It is probably the most broadly liked of all the Puppy stories and I agree it has many positive qualities.

Kary English has also distanced herself from the Puppy campaigns somewhat and in an extended comment at File70 outlined her views.

Furface Tension 6/26

I also wish people like Brad, Larry and other SP notables would come out and say “Hey, this* isn’t what we intended or what we hoped would happen. We’re sorry the whole thing has become such a mess.” (*where “this” means locking up the ballot and shutting out other works)

I don’t consider myself a spokesperson for the SP, or even an SP notable, but I’ll say it. I never got involved in this with any idea that I’d even make the ballot, much less that VD would run his own campaign or that there would be a ballot sweep. If I’d known that, I wouldn’t have participated. To the extent that I’ve been part of that, even unknowingly, I apologize.

It seems I can’t say anything remotely in that vein without someone saying that if I truly thought that, I would withdraw. I’ve already given my reasons for not withdrawing, but I’ll mention again that a large part of it is not giving Vox Day the satisfaction.

All that stuff about nominating liberals just to watch them self-flagellate and see how fast they withdraw? I’m not his marionette, and I won’t dance to his tune. He set us up to be targets, just like he set up Irene Gallo. I’m not giving in to Vox Day.

This has provoked what can best be called a very sulky reaction from Vox Day in which he basically says he doesn’t care. I shan’t link to it because the comments that follow are extraordinarily nasty and vindictive. I will share this quote from Day’s post:

I think it’s interesting that she thinks I have given her any thought whatsoever. Kary, my dear, I don’t give a quantum of a damn what you do. Withdraw, don’t withdraw, retire to a nunnery, it makes absolutely no difference to me.

So Day announces his utter lack of care…

Following on from that Day has posted his picks for best story (which I will link to – I don’t think my little blog lends his much web credibility of googlyness)

  1. “Turncoat”, Steve Rzasa (Riding the Red Horse, Castalia House)
  2. “The Parliament of Beasts and Birds”, John C. Wright (The Book of Feasts & Seasons, Castalia House)
  3. “On A Spiritual Plain”, Lou Antonelli (Sci Phi Journal #2, 11-2014)
  4. “A Single Samurai”, Steven Diamond (The Baen Big Book of Monsters, Baen Books)

Notably Kary English’s story (as nominated by Vox Day’s own campaign) is now missing. I guess Day just forgot in his total lack of caring rather than spitefully deciding that English was now an ‘unperson’.

I still won’t be voting for Totaled above No Award for the reasons I outlined here. However I think the chance of her actually winning this category has increased substantially.

Why Science is Never Settled – a review of part two of the essay

My review of Part 1 of Why Science is Never Settled by Ted Roberts can be found here.

Part 2 Is focused more on the fallibility of science. Like Part 1 it lacks focus or connection with a unifying argument. In some ways it acts more like an appendix to Part 1, with a look at various different issues in depth.

Part 2 is split into several major sections:

  1. Scientists are human, too – which looks at human failings and spits in science but primarily concentrates on the pressure on academic scientists to ‘Publish or Perish’
  2. Lies, Damned Lies, and Statistics! – which looks at statistical analysis and failures by scientists when conducting statistical tests.
  3. The vaccine controversy – a case study on the Andrew Wakefield affair (which I discussed in my first review)
  4. The problems of peer-review – a look at issues with peer review with links to some notable cases.
  5. It’s a process, not a conclusion – which ironically acts as a sort of conclusion to the whole essay but oddly isn’t the last section.
  6. Internet memes and the love of science. – which is basically just some complaints about the Facebook site (the title of the site isn’t “feaking”). You can safely skip that bit.

I’ll go through the sections in turn to varying degrees of detail. Continue reading “Why Science is Never Settled – a review of part two of the essay”

Why Science is Never Settled – a review of part one of the essay

Reviewing two (here and here) of the Best Related Work Hugo nominees made me realize I had to do at least one more. Why Science is Never Settled by Ted Roberts is an essay on the scientific method. It isn’t science fiction and it isn’t appalling but it isn’t good. Unlike The Hot Equations it isn’t trying to apply science to science-fiction but unlike Wisdom from My Internet it is not just awful rubbish whose only resemblance to a book is pagination. Roberts has written about his views on the Puppy kerfuffle here.

The essay is in two parts. Part 1 discusses his general view of the scientific method and Part 2 discusses more particular issues. This review covers Part 1 only – partly because it became quite long and unwieldy and partly because the character of the piece changes. A review of Part 2 is here. In places I will refer to sections from Part 2 in this review.

Overall it is a weak essay but with some good to fair parts. The writer is a working scientist with obvious experience with statistical analysis, experimental method and peer review. He clearly is giving an informed insider’s view of science that gives an overview of the processes involved. It does give insights into the writer’s own thinking and it may have been better presented as a set of ruminations on the topic of science.

If we review it as an example of Best Related Work it is a definite technical fail. It’s connection with science fiction is that the author writes some fiction and has had non-fiction published by Baen books. Been is a publisher of SF/F and is spoken of more favorably by the puppy campaign than Tor books.

It is more fair to review at as an essay on science without reference to the Hugo Awards. I’d don’t have strong feelings about strongly policing award categories for taxonomic exactitude and so I’m putting aside the question for the moment of whether it counts. Instead I’ll consider it in terms of its content.

I’ve seen reviews elsewhere that have treated this essay kindly – giving it a passable rating as something you might give to somebody as an introduction to the methodology of science. I would suggest that would be unwise as its faults are many.

Overall it lacks focus: there is not a clear view point that the author tries to establish. For much of it he seems to be dancing around various issues. There are coy references to some topic which suggest that the intended audience is a right wing one (e.g. the title echoes claims by political supporters of action on climate change that the science is settled i.e. that the debate should now be about policy rather than whether anthropogenic global warming is occurring). Having a right-of-center viewpoint is not in itself a problem but it not a viewpoint that the author actually develops or discusses but rather vaguely eludes to.

At this point it is best if I work through the essay in stages. I will use indented italics for quotes from the text because the ‘blockquote’ style provided is a bit hard to read for lengthy chunks of text: Continue reading “Why Science is Never Settled – a review of part one of the essay”