ngramming across the universe

…only going forward because we cannot find reverse.

I’ve gone a bit n-gram crazy today – as can happen. I thought I could test the question of whether the Hugo Awards have lost relevance or importance by graphing the term “Hugo Award” over time with the n-gram viewer. Now there is a little extra trick you can do which is to graph a term in one corpus and in another at the same time by using special syntax. I’m going to use the English 2012 corpus (books written in English digitized up to 2012 – although it only shows up to 2008) but also the English Fiction 2012 corpus (fiction books in English).

The syntax is Hugo Award:eng_2012,Hugo Award:eng_fiction_2012


Which is a graph with some sort of story behind it.

A couple of things. The data is normalized over time, so figures represent percentages of text from that year. As the English 2012 is a big corpus the term “Hugo Award” will be a smaller percentage in the bigger corpus than in just the fiction.

However what the graph does show is the term “Hugo Award” has a general upward trend when looked at in books in general but quite a different pattern when looked at purely fiction.

Why? Well I personally have no idea but I assume it relates to the extent to which published fiction (including anthologies and magazines) will have included the term “Hugo Award”. That seems to have peaked from 1978 to 1982.

One thought on “ngramming across the universe

  1. Change the smoothing to 0, and zoom in to 1970-1990, and it looks very different [1]. The plateau is gone. Instead, there’s a small peak in 1981 and a huge one — a veritable Fist of God — in 1979.

    If you search Google Books for the phrase “Hugo Award”, you’ll see fewer than one hundred books per year. There were a few more books from 1979 than other years, but not *that* many. However, two of them are reference books that look misclassified: Nicholls’ “Science Fiction Encyclopedia” [2] and Magill’s “Survey of Science Fiction Literature” [3]. Both have editions whose subjects include “Fiction / Science Fiction / General”, which returns 160,000+ results, almost all fiction (at least as far as I was willing to look … ).

    Is there a way to specify a corpus in a Google Books search? Or is there a list of titles in the English Fiction 2012 corpus? You raised an intriguing question, and I’d like to know for sure.





Comments are closed.