Using Goodreads Data

The other week I posted some comments at Mad Genius about this post http://madgeniusclub.com/2016/02/15/on-teachers-kids-and-hugos/ – unfortunately most of the responses were not exactly shining examples of how to have a rational discussion. However, one chap did decide to engage on a level above name-calling, which was nice.

The seed of the sub-sub-discussion was this comment I made:

camestrosfelapton
February 15, 2016 at 2:56 pm
Arrgh hit ‘send’ accidentally
Yes an argument could be made but
1. Sad Puppies 3 was called a slate by Brad T
2. Previous Sad Puppy campaigns had been called stacking campaigns
3. The Rabid Puppy campaign was clearly functioning as a slate (yes, yes I know that the Sads don’t control what Vox does etc)
4 . The quality of the works nominated were highly variable – strongly indicating that some peoples were voting along a slate rather than judging works by their quality (e.g MZW’s excellent ‘Soft Casualty didn’t get nominated but his execrable Wisdom From… did)

And there was an OKish discussion about 4 despite various people trying to derail it. Further on Dave Freer joined in:

davefreer
February 15, 2016 at 5:54 pm
Only if backed up by independent empirical data. Your taste is not everyone’s. This is a fan award – a measure of popularity, which can be measured independently by sales. So one can fairly say Dune was a quality nomination. A novel which after a couple of years has gone out of print is not.

To which I replied:

camestrosfelapton
February 15, 2016 at 6:24 pm
Ok but neither past Hugos nor any of the Sad Puppy picks have been simply the best selling SF/F. So while we would expect some relation with popularity there us at least some other latent trait in play.

And to which Dave appears to have lost his train of thought and responded to something else altogether:

davefreer
February 15, 2016 at 11:20 pm
Jim Butcher, Larry Correia Aren’t ‘bestselling’!? That’s probably the most bizarre thing you’ve yet managed to say, and you’ve come up with some doozies.

Well I pointed out his error but Dave had started deleting my posts. I guess he didn’t like somebody making the obvious correction. Never mind. Maybe the ‘simply’ bit confused him i.e. they may have been best-selling but that wasn’t all that they were. Did I pitch that too subtle? Maybe.

Another commentor joined in to say:

kamas716
February 16, 2016 at 12:07 am
I’ll look into any sort of correlation there over the next couple of days. I don’t think I’ll get it all together tonight. Maybe by Saturday?

And eventually they did: https://westfargomusings.wordpress.com/2016/02/21/preliminary-analysis-part-1/

Good for them. Turning to the data is always a smart move.

He starts with:

I indicated to Camestrosfelapton that I would look into his claims that recent Hugo Nominations by the Sad Puppies were not up to snuff

Hmmm. OK not quite what I said. I don’t know if he is trying to address the garbled version from Dave Freer or if he had read what I’d said and misunderstood. To clarify: my point was that the stuff nominated by the Sad Puppy/Rabid Puppy slates was highly VARIABLE in quality. That was one of the pieces of evidence that points to slate voting i.e. some of it was OK to good (Totaled) and some of it was execrable sh!T (Wisdom from I really Can’t be bothered typing its name anymore)

So kamas716 has collated Goodreads rankings for Hugo Best Novel winners and nominees going back in time – which is an interesting data set. From that he looked at the mean ranking and found that The Dark Between the Stars was OK and that Skin Game was: “actually the highest rated novel ever nominated for a Hugo”.

Now what question are we trying to answer with this data? The questions is one about the quality of the books and we are using popularity as a PROXY for quality because we can’t quantify popularity directly. How good a match is it? Well, as his data shows it looks OK but consider that Skin Game is the highest rated novel EVER nominated for a Hugo (he went back to 1953). That is a good sanity check on the data – is Skin Game really the best SFF novel written since 1952? Better than Dune, The Man in the High Castle, the Lefthand of Darkness etc? Does that sound plausible? Yeah, probably not. That doesn’t mean a popularity rank is useless as a proxy but it does suggest it has limits e.g. it might correlate with quality but not perfectly and not to the point that we can compare whether Skin Game is actually better than The Dark Between the Stars just with a Goodreads rank (as it happens I think Skin Game is a LOT better than DBTS but that is beside the point.

Unfortunately even if it was a good proxy Kamas176 didn’t apply it to the point I was making: the Puppy nominees were highly VARIABLE in quality. For that he needs to compare across categories and consider the range or another measure of spread. Still all interesting stuff.


20 thoughts on “Using Goodreads Data

  1. Actually, you shouldn’t expect the winner to be more popular than the nominees. This is called “norming out” a parameter. It’s the same reason why if you look at top-performing linemen in the NFL, performance isn’t correlated with weight. The ones with average weight are the highest-performing.

    This happens because weight is so important for linemen that a lineman doesn’t get into the NFL at all without being pretty heavy. Of those who actually got in, other factors determine performance because the average weight for an NFL lineman is already optimal. Thus the fact that weight is of critical importance is completely lost if you only study men who are already linemen. (You’d need to study the whole male population.)

    Likewise, one could argue that a book shouldn’t win a Hugo unless it was widely known and liked. But Goodreads scores are awarded by readers based on a few seconds of thought. (And there’s some gaming involved too.) Hugos are awarded by readers who gave the matter a great deal of thought and who compared the books to one another. It’s not a surprise that the Goodreads rating is normed out.

    Looking at his chart, eight Hugo winners are above his average for all nominees and seven are below it. That’s exactly what we would expect to see. Thinking this is a significant result is an easy mistake; I’ve seen a lot of people make the same mistake over the years.

    If the Hugo Award winners really were greatly at odds with public tastes, they should be significantly less popular than average across ALL SFF books in any given year. Given the numbers in the chart, that seems unlikely.

    Another complicating factor is that across many years, the Hugo winners of the past are likely to steadily accumulate positive votes simply because they are winners. You’d need to somehow measure the popularity of each book BEFORE it won the award. Otherwise it will always look like the past winners were much more popular than the present ones. (This would help identify things that probably shouldn’t have won the Hugo in the first place, though.) At a guess, I’d expect really old things to do a bit worse, then I’d expect a plateau, followed by a gradual decline to the present. It’d be interesting to see what it really looks like.

    I’m a little concerned that he reported mean averages and not weighted averages. If one book that got 100 votes and averaged 4.0 while another got just 1 vote of 3.0 it’s really wrong to say they averaged 3.5.

    Finally, I think the Amazon numbers might be more informative simply because there are generally a lot more Amazon voters than Goodreads voters.

    Like

      1. Amusingly he’s making similar points about the reliability and trends of Amazon data to ones I tried to make to Freer last year.

        Like

  2. Hmm, ratings of books published prior to goodreads becoming A Thing doesn’t really convince me – that’s people rating the classics they remember, which has a bias built in. I think their point would be better served by sales as a popular measure rather than ratings. I did see a good analysis of number of *total* goodreads ratings against total sales showing a strong correlation, which was quite an interesting result, but can’t re-find it right now 😦
    I think the Hugos land somewhere between “popular opinion” and “critical opinion” (scare quotes because those are such waffly concepts). You could probably frame the SP as wanting to drag the Hugo closer to popular.

    Like

  3. I must be too optimistic. I’m always shocked at how quickly the comments there turn ragey. It’s like 0 to 100 mph attack mode default in the first reply. Doesn’t seem too healthy, really.

    I skimmed that thread and the reading comprehension level was … not impressive. It reminded me of a conversation I once had with a student who used a word incorrectly and when we were going over the essay in my office. I kept explaining that it was not what the word meant so it had to be rephrased in order to make sense. No matter how many times and ways I tried to explain why one cannot just randomly use words, the student kept coming back with “yes, but I used it to mean xx” and “yes, but I really liked how those words sounded together so that’s why I used it that way.”

    The deal with using best-selling as a measure of either popularity or quality is that the data is not reliable. It doesn’t take into account all sorts of things like:
    -how many people checked it out from a library or borrowed from a friend
    -how many library copies were bought and never read
    -how many people were given it as a gift, unwanted and never read
    -how many people bought it, hated it, threw away
    -copies in translation
    -copies bought from second-hand bookstores
    -scandals like the politicians etc gaming the NYT best seller list by buying own title in bulk
    -how many people bought it to make a point (** looks at shelf, sees Rushdie’s Satanic Verses sitting there still unread**)
    -the judgement of posterity

    It’s like those stupid impact factors and citation counts that have been springing up across academia. They don’t control for things like: citation because it’s always cited as a classic, citation because you are explaining how wrong it is, citation in student essays or work or read for class, missed cites because of various name changes (esp for women’s works), transliterated name problems, or just plain citations missed. I like numbers, but not those numbers.

    Apparently my wisdom on the internet takes more than 140 characters. Must work on that if I want awards (I don’t).

    Like

  4. I like Satan and I like verses, so recommendation duly noted.

    Just read through the Orhan Pamuk corpus (“Istanbul” is magical). Currently obsessed with accounts of climbing Everest while on treadmill, and Philbrick’s “In the Heart of the Sea: The Tragedy of the Whaleship Essex” while in the bathtub. No joke.

    Like

      1. It wouldn’t mean much but we would know that half of the set will be above and half below (and that the value would fairly stable despite the power law aspect) So if a subset ( defined independently) was consistently above or below the median then that would be a thing of sorts.

        I haven’t read what he’s done yet though. Is it sales or rank? I see that *sales* would follow a power law but does rank?

        Liked by 1 person

      2. Yeah, unit sales (not dollar sales) follow a power-law distribution. That is, the nth best-selling book has sales that are ((n+1)/n)^p greater than the (n+1)th best-selling book, where p is a number near 1. I’d say assume p=1.1 and pick an arbitrary value for total sales. (Since we’re only concerned with relative positions.)

        Like

  5. A measure he should consider using, if it’s not too difficult, would be total number of comments on Amazon. That’s a reasonable proxy for sales, and Amazon has publicly stated that sales and comments (good and bad together) are highly correlated.

    Like

  6. Gah. The Skin Game Goodreads rating argument always annoys me. It’s the *fiftenth* book in the series. The population reading it already quite invested in the writing, and have a clear bias for it. As challenging as “quality” metrics are, Goodreads is nowhere near a reasonable indicator.

    Like

  7. @ Snowcrash, I agree with you. Skin Game is unique in that it is 15th in a series. Even so, Skin Game ranked 3rd in voting for Fantasy books for the 2014 Goodreads Award. Deborah Harkness ranked 1st and more than doubled Butcher. So given the puppy argument, Harkness’s “intermarriage is great” and “so is diversity” fantasy romance novel should be first on their slate. Where is it?

    Let’s go to the “Science Fiction” category. This is a less popular category on Goodreads than Fantasy; times, they are a changing as Bob has said. Within the SF category, the clear favorite in 2014 was “The Martian”. It wasn’t eligible, but its author was eligible for the Campbell. Not only did the puppies not slate Weir but they knocked him off the ballot. And after “The Martian”, the next favorite was “Lock In” by Scalzi who they also knocked off the ballot.

    If you compare Goodread votes for Scalzi and Correia, Larry gets creamed.

    So where is the Puppy Argument?

    It seems to me the breaking point for tier 1 in Goodreads is about 30K votes as it relates to 2015 Goodreads Choice Awards. Here is the rank based on total votes:
    Rank/Title/Author/Votes/Category/Place
    01 / Girl on a Train / Paula Hawkins/ 106K / Mystery/ 1st
    02 / The Nightingale/ Kristin Hannah / 57K / Historical Fiction / 1st
    03 / Red Queen/ Victoria Aveyard / 47K / Gr Debut Author / 1st
    04 / Sword of Summer / Rick Riorden / 43K / Childrens / 1st
    05 / Queens of Shadows / Sarah J Mass / 36K / YA Fantasy / 1st
    06 / Confess / Colleen Hoover / 35K / Romance / 1st
    07 / Trigger Warning / Neil Gaiman / 34K / Fantasy / 1st
    08 / Finders Keepers / Stephen King / 32K / Mystery / 2nd
    09 / Golden Son / Pierce Brown / 32K / Sci-Fi / 1st
    10 / Why Not Me / Mindy Kailing / 32K / Humor / 1st
    11 / All the Bright Places / Jennifer Niven / 32K / Young Adult / 1st
    12 / Darker Shade of Magic / V. E. Schwab / 31K / Fantasy / 2nd
    13 / Go Set a Watchman / Harper Lee / 31K / Fiction / 1st
    14 / Carry On / Rainbow Powell / 30K / YA Fantasy 2nd
    Snowcrash, I assume you are not a puppy but you nominated Red Queen at Hoyt’s site. How will the puppies do? Seems to me I should see Pierce Brown, V.E. Schwab, Neil Gaiman and Victoria Aveyard from someone other than Snowcrash. Let’s see if the market agrees with the pups and if the pups eat their own sauce.

    Like

Comments are closed.