Hugo Author Page Views

I gathered the Wikipedia pages of all the authors in my great big Hugo spreadsheet and used my page view gathering tool to add a page view figure to every author with an English Wikipedia page on that sheet. Most of the authors on this list of Hugo Finalists for Novel, Novella, Novelette and Short Story have a Wikipedia page but all the caveats about this data apply. A good example of the issues is Frank Herbert, whose page views have increased because of interest around the new film version of Dune. That doesn’t make the page views utterly flawed as a figure, we just need to be clear that they are a measure of current levels of attention and that currency can change dramatically for individuals.

The other more numerical issue is the distribution. Authors that are currently getting a lot of Wiki-attention do so at a scale orders of magnitude greater than those that aren’t. That can make graphing the data tricky and it also does bad things to measures of central tendency aka averages.

This time I want to look at trends over time. I’m plotting the Hugo Award year against an aggregated value of the authors who were finalists in story categories. To cope with the spread of values I’m using a logarithmic scale for the vertical axis.

Hugo story finalist graphed by year and Wikipedia 30 day page views gathered 14/09/2020

The median is less impacted by the smallest and largest values in each year. Also, in this case I’m treating authors without Wikipedia pages as missing data rather than zero. The most famous authors don’t really influence the graph unless they were finalists with a whole bunch of really famous people. I think 1964 (currently) is the peak year because of a combo of Heinlein, Anderson, Vonnegut, Norton, and Rice-Burroughs. The outliers that year are Frank Herbert (because of the Dune movie) and Clifford D. Simak (a decent number of page views just low for that year), plus Rick Raphael who gets treated as missing data because he doesn’t have an English Wikipedia page.

Arguably, there is a visible late 1990’s/early 2000 dip that has been anecdotally claimed in discussion about the Hugo Awards. Whether that is an actual feature of those finalists or whether they just fall in that spot between too long ago to be notable now but not far back enough to be revisited as classics remains an open question.

Intentionally, the graph ignores two important groups: the authors who are really, really notable currently (in terms of Wikipedia page views) and the authors who aren’t. I’ll deal with the first group by looking at the maximum values per year.

Hugo story finalist graphed by year and max values 30 day page views

I think that is very much a nothing-to-see-here sort of graph. Note that I’ve changed the maximum and minimum points on the vertical axis to fit the data in. Generally, the really high values are consistently high.

Hugo story finalist graphed by year and min values 30 day page views

The minimum value starts very noisy and then gets more stable. Remember that those authors without Wikipedia pages are counted as missing rather than zero, so don’t impact the values on this graph. I think the most recent years would look a bit noisier if we counted the missing authors as zero instead because the most recent years naturally have more early career writers who haven’t got Wikipedia pages yet.

Lastly, here is the first graph again of the median value but this time only showing the value for the winners.

Hugo story winners graphed by year and median values 30 day page views

That looks like it’s trending down a bit but note that this value will be more influenced by the shorter fiction finalists.

Which Hugo story finalists don’t have a Wikipedia page

My capacity to generate (rather than just make-up) trivia increases every week. Today I get to tell you which Hugo Finalists in Novel, Novella, Novelette and Short Story do not currently have a Wikipedia page.

FinalistFirst Year a Finalist
Anton Lee Baker1959
J. F. Bone1959
Rick Raphael1964
Hayden Howard1967
William Walling1975
Jeff Duntemann1981
Eric Vinicoff1985
W. R. Thompson1991
Nicholas A. DiChario1993
Bridget McKenna1994
Jan Jensen2000
Shane Tourtellotte2002
Pat Forde2003
Christopher Rowe2005
Gray Rinehart2015
Kary English2015
Rajnar Vajra2015
Steve Rzasa2015
Steven Diamond2015
Charles W. Shao2016
Cheah Kai Wai2016
Daniel Polansky2016
David VanDyke2016
Juan Tabo2016
S. Harris2016
S. R. Algernon2016
Stix Hiscock2017
K. M. Szpara2018
Simone Heller2019
Nibedita Sen2020
Siobhan Carroll2020
Hugo Story Finalists who do not have a Wikipedia Page

Of the 31 authors, ~42% are from the period 2015 to 2017. It’s like something happened during that time but it is to hard to infer what it was from the statistics[1].



[1] I’m joking

Who “won” the Puppy attention wars?

A good point people raised about yesterdays post on Wikipedia page view metrics is that it captures a current state but in many cases we are more interested in a historical value. This is particularly true when we are looking at the impact of awards or events.

Luckily I don’t need to advance my web scrapping tools further to answer this as Wikipedia actually has a tool for looking at and graphing this kind of data. Like most people I’ve used Wikipedia for many years now but I only learned about this yesterday while looking for extra data (or maybe I learned earlier and forgot — seems likely). The site is https://pageviews.toolforge.org and each of the page information pages has a link to it at the bottom under ‘external tools’.

It’s not really suitable for a data set of hundreds of pages but it is quite nice for comparing a small number of pages.

Just to see how it works and to play with settings until I got a visually interesting graph, I decided to see if I could see the impact of the Hugo Awards on four relevant pages. Now the data it will graph only goes back to 2015, so this takes the impact of SP3 as a starting point. I’ve chosen to look at John Scalzi, N.K. Jemisin, Chuck Tingle, Vox Day and Larry Correia.

https://pageviews.toolforge.org/?project=en.wikipedia.org&platform=all-access&agent=all-agents&redirects=0&start=2015-07&end=2020-08&pages=Vox_Day|John_Scalzi|Larry_Correia|Chuck_Tingle|N._K._Jemisin

I added a background colour and labels. The data shows monthly totals and because of the size of some spikes, it is plotted on a logarithmic scale. Be mindful that the points are vertically further apart in terms of actual magnitude than is shown visually.

I think the impact of N.K. Jemisin’s second and third Best Novel wins is undeniable. There is a smaller spike for the first win but each subsequent win leads to more interest. I don’t know why Chuck Tingle had a big spike in interest in January 2017.

I’ve added a little red arrow around July 2019. That was when there was a big flurry among some Baen authors that Wikipedia was deleting their articles https://camestrosfelapton.wordpress.com/2019/07/29/just-a-tiny-bit-more-on-wikipedia/

Anyway, to answer my own question: talent beat tantrums in the battle for attention

Authors: which ones get looked up?

A perennial question around award nominees is just how significant are the authors being honoured. It’s a tricky question, particularly as there is no good data about book sales. Amazon ranks are mysterious and Goodreads data may be a reflection of particular community.

I’m currently taking a few baby steps into web scraping data and I was playing with Wikipedia. Every Wikipedia article has a corresponding information page with some basic metadata about the article. For example here is the info page for the article on the writer Zen Cho https://en.wikipedia.org/w/index.php?title=Zen_Cho&action=info On that page is a field called “Page views in the past 30 days” that gives the figure stated. As a first attempt at automating some data collection, it’s a relatively easy piece of data to get.

So, I put together a list of authors from my Hugo Award and Dragon Award lists, going back a few years (I think to 2013). Not all of them have Wikipedia pages, partly because they are early in their careers but also because Wikipedia does a poor job of representing authors who aren’t traditionally published. Putting the ‘not Wiki notable’ authors aside, that left me with 163 names. With a flash of an algorithm I had a spreadsheet of authors ranked by the current popularity of their Wikipedia page.

Obviously this is very changeable data. A new story, a tragedy, a scandal or a recent success might change the number of page views significantly from month to month. However, I think it’s fairly useful data nonetheless.

So what does the top 10 look like?

1Stephen King216,776
2Margaret Atwood75,427
3Brandon Sanderson72,265
4Terry Pratchett55,591
5Rick Riordan43,484
6N. K. Jemisin34,756
7Cixin Liu32,372
8Sarah J. Maas21,852
9Ian McEwan20,468
10Neal Stephenson20,058

The rest of the top 30 look like this:

11Robert Jordan19,169
12Ted Chiang17,635
13Owen King16,041
14Jim Butcher15,493
15James S. A. Corey15,109
16Stephen Chbosky14,490
17Leigh Bardugo13,787
18China Miéville13,580
19Andy Weir13,057
20Harry Turtledove11,452
21Cory Doctorow11,362
22Jeff VanderMeer11,243
23John Scalzi10,796
24Chuck Tingle10,763
25Ben Aaronovitch10,493
26Brent Weeks10,271
27Ken Liu9,003
28Tamsyn Muir9,002
29Alastair Reynolds8,951
30Kim Stanley Robinson8,879

There’s a big Zipf-like distribution going on with those numbers that decline quickly by rank. John Scalzi has Chuck Tingle levels of fame on this metric.

OK, so I know people want to know where some of our favourite antagonists are, so here are some of the notable names from the Debarkle years.

40Vox Day5,271
45Larry Correia4,455
60John Ringo2,878
81John C. Wright1,251
111Brad R. Torgersen560
123Sarah A. Hoyt407
140L. Jagi Lamplighter229
152Dave Freer102
153Lou Antonelli101
156Brian Niemeier81

Day probably gets a lot more views due to people looking him up because of his obnoxious politics. Larry Correia is in a respectable spot in the 40’s. He is just below Martha Wells who has 4,576 page views — which is essentially the same number given how these figures might change from day to day. John Ringo is just above Chuck Wendig and Rebecca Roanhorse (2,806 and 2,786). John C Wright is sandwiched between Tade Thompson and Sarah Gailey.

You can see the full list here https://docs.google.com/spreadsheets/d/14uQsQNxKyPQtxybu4OxsFrdRRl_v-tdW0fN0oblgFw4/edit?usp=sharing

Let me know if you find any errors.

2020 Dragon Awards

Well, I can say what I like about the Dragon Awards but their livestream award announcement beat the Hugo Award in terms of efficiency and general presentation.

The winners (I missed the games) are:

  1. Best Science Fiction Novel: The Last Emperox by John Scalzi
  2. Best Fantasy Novel (Including Paranormal): The Starless Sea by Erin Morgenstern
  3. Best Young Adult / Middle Grade Novel: Finch Merlin and the Fount of Youth by Bella Forrest
  4. Best Military Science Fiction or Fantasy Novel: Savage Wars by Jason Anspach & Nick Cole
  5. Best Alternate History Novel: Witchy Kingdom by D. J. Butler
  6. Best Media Tie-In Novel: Firefly – The Ghost Machine by James Lovegrove
  7. Best Horror Novel: The Twisted Ones by T. Kingfisher
  8. Best Comic Book: Avengers by Jason Aaron, Ed McGuinness
  9. Best Graphic Novel: Battlestar Galactica Counterstrike by John Jackson Miller, Daniel HDR
  10. Best Science Fiction or Fantasy TV Series: The Mandalorian – Disney+
  11. Best Science Fiction or Fantasy Movie: Star Wars: The Rise of Skywalker by J. J. Abrams

Also Siobhan Carroll won the Eugie Award for He Can Creep which was a personal favourite.

The last one for the time being

This is fan categories but with the intermediate nodes of category or year. In other words the edges of the graph join names together directly but represent that the two names shared a category in a year. It makes a big bow tie.

It is less than accurate though because of an alphabetical bias. While all finalist appear, because of a column limit when I was processing the data, the lists only went ten-deep. That means finalists further down the alphabet in years with lots of named finalists, don’t get as many connections as they should have.

A more readable version here in PDF.

The file name was short for fan categories with direct connections. However, “Fan Cats Direct” sounds like an interesting business.

They made me do this

I blame James Davis Nicoll who forced me to do this.

Hugo Fan Categories (Artist, Fanzine, Fan Writer and Fancast) by year.

The growth in years size post 2012 is firstly by the addition of fancast but also more group finalists (and often in Fancast). It looks like some huge city on a bay with bridges out to islands (the Mike Glyer Bridge joining the historic CBD to the 2016/18).

Here they are organised by award category.

Hugo Mode

There’s a mob of network data scientists with flaming pitchforks hammering at the doors of Felapton Towers in a vain attempt to drag these tools out of my hands and try me for crimes against having the faintest idea of what I’m doing. In the meantime this blog is all graphs all the time until I run out of things to stuff into Gephi and see what happens.

In this case what happens was more useful than I imagined. I thought mapping connections between the four award categories I have collected in my big-hugo-spreadsheet (Novel, Novella, Novelette, Short Story) would be a bit dull. However, the graph has done a very nice job sorting authors into nine semi-neat clusters.

ETA zoomable PDF below:

The four big outer broccoli-like fronds show authors whose work has only been nominated in a single category. Let’s call them the specialists. There is a second ring of four groups which joins adjacent pairs, Novelette & Novella, Novella & Novel, Novel & Short, Short & Novelette. Nice. Then the sorting hat gives up and lumps everybody else in the middle.

Now a good data diagram should raise questions and this one does. There are two pairings we can’t see easily because they sit on the diagonals of the quadrilateral: Novella & Short, Novelette & Novel.

Authors who have only been finalists in the Novella & Short categories are:

  • Amal El-Mohtar
  • Andy Duncan
  • Gregory Benford
  • Jack McDevitt
  • Joanna Russ
  • John C. Wright
  • Keith Laumer
  • Ken Liu
  • Kij Johnson
  • P. Djèlí Clark
  • Rivers Solomon
  • Spider Robinson
  • Ted Reynolds

Authors who have only been finalists in the Novel & Novelette categories are:

  • Andre Norton
  • Charlie Jane Anders
  • David Gerrold
  • Murray Leinster
  • Paolo Bacigalupi
  • Philip K. Dick
  • Piers Anthony
  • Tom Reamy
  • Walter M. Miller, Jr.
  • William Gibson
  • Yoon Ha Lee

The rest of that central cloud are the hat-trick authors (3 categories) and what I guess we might call the grand-slam authors (4 categories). These are both large groups.

The grand-slam category is interesting. It consists of 25 authors and is quintessentially Hugo Awardish. Given the very male-dominated decades of the Hugos, I was glad to see that the group has many women in it — 8 out of 25, so still an under-representation but better than I expected.

  • Algis Budrys
  • Clifford D. Simak
  • Connie Willis
  • David Brin
  • Fritz Leiber
  • George R. R. Martin
  • Gordon R. Dickson
  • Greg Bear
  • Joan D. Vinge
  • John Varley
  • Kate Wilhelm
  • Kim Stanley Robinson
  • Larry Niven
  • Mary Robinette Kowal
  • Maureen F. McHugh
  • Michael Bishop
  • Michael Swanwick
  • Nancy Kress
  • Orson Scott Card
  • Poul Anderson
  • Robert Silverberg
  • Roger Zelazny
  • Samuel R. Delany
  • Ursula K. Le Guin
  • Vonda N. McIntyre

Debarkle Conversations

One more of these network explorations. I tried a bit of data mining on the Puppy Kerfuffle Timeline. The idea was to pick out from entries people talking about other people or being talked about together. So John Scalzi talking about Vox Day or vice versa. A few stray non-people (or groups of people) got in the mix as well. Also Santa Claus?

The graph is undirected i.e. it doesn’t distinguish between talking and being talked about. Also, this is very much NOT about allegiances or other connections — a line joining a group is more likely to be a critic than an ally.