Some number crunching on publishers

This is just some background for the next Debarkle chapter looking mainly at Baen (that was always the plan, prior to recent events – I’d already started the chapter).

Looking at story categories, Tor Books has had 106 works as Hugo finalists. ISFDB lists “10038 publications not in a publication series” for Tor. Baen Books has had 12 works as finalists (including stories from Jim Baen’s Universe) and according to ISFDB “3758 publications not in a publication series“. The ISFDB entry numbers is a very rough way to get a sense of the relative volume of the two publishers, particularly as the ISFDB listing will include works republished that were originally published by different publishers. Even so, the hit rate for Tor is proportionally higher but not vastly so. If I cut off the Hugo numbers at 2013 to remove the Sad Puppy influence and backlash, as well as the Tor.com novella explosion, Tor has had 55 finalists and Baen has had 10. So about 5 times the number of finalists but about 2.7 times the amount published — again, a better hit rate than Baen but not so much bigger as to think Baen had a particular disadvantage.

The numbers are a bit too fuzzy to be sure I’m not comparing apples and chromebooks. More of a sanity checks. Any ideas of tightening that up as a comparison?

36 thoughts on “Some number crunching on publishers

    1. On my phone now, so from memory it is mainly Bujold plus two Mike Resnick stories from Jim Baen’s Universe. Tor’s are more varied but then that’s also what you’d expect from the different sizes (Tor’s record will be more varied because it is bigger)

      Liked by 1 person

    2. The distinct author metric would be a good one to know.

      And I think Bujold’s the only Baen book to win Best Novel and Series.

      Take out Universe (which was a really great magabook thingy that I bought regularly) and LMB and… probably nothing left in honest years.

      Like

      1. I’m still interested in the number of distinct authors. It’d take out statistical flukes and give a more even distribution of the houses’ bench (in sportsball terms).

        Liked by 1 person

      2. @Lurkertype re “I’m still interested in the number of distinct authors.”

        Here you go (assuming WP doesn’t object to pastebin links): https://pastebin.com/y9VL8MJQ

        These are year-by-year stats for 4 publishers mentioned in these comments: Baen, Tor (excluding Tor.com, Tor Teen, Tor UK, etc), Saga Press and DAW.

        The underlying code is part of a project I’ve been working on-and-off (but more the latter…) about how many debut authors get published, so some of the info is irrelevant to your question.

        NB: I make no guarantees about the correctness of this data – I know for a fact that JJ posted a comment on File 770 about how many books Baen published last year that doesn’t match the value I have; whether this is due to a bug in my code, or a difference in methodology, or something else, I dunno.

        Liked by 1 person

      3. John, your code probably doesn’t remove books published in previous years. If you post your list of 2020 Baen novels somewhere and give me the link, I can tell you which are reprints.

        I did this de-duping manually. To do it with code, you’d have to check each novel’s parent record for an edition in a prior year. This is possibly not a trivial algorithm.

        Liked by 1 person

      4. @JJ: Here’s a list of the 25 new/original novels that I think Baen published last year: https://pastebin.com/MJKyNVsb . Novellas, collections, anthologies etc are all excluded, as the project I was working on was novel-focussed.

        I’ve not been able to find the comment you posted on File 770 amongst the bazillion threads mentioning Baen in the past couple of weeks, so I might be misremembering that I’d got a different value from you.

        The algorithm I use for determining new novels is if the publication record (e.g. for hardback, paperback or ebook) has the same publication year as the overall title record. This isn’t flawless (e.g. a title published elsewhere in Dec 2019, but with a first US pub of Jan 2020 wouldn’t be considered a new title), but it’s hopefully enough to catch 9x% of them.

        Liked by 1 person

      5. Thanks, John. Have you updated this? For some reason I thought you had ~54 novels for Baen in 2020.

        At any rate, I had gotten a total of 31 books, but 6 of those were so old that my manual method didn’t catch that they were reprints, and your total is correct.

        That 25 makes Baen’s totals look even less significant compared to the 850 novels which Locus reported published in 2020.

        It’s interesting to note that in the last decade, Baen’s and DAW’s totals are quite closely aligned.

        Would you be willing to run your algorithm for Orbit?
        (ISFDB Pub IDs 113, 17789, 25280, 25520, 29541, 42723, 43424, 45819, 50874, 58047) <– those last 2 are accurate 🙂

        Liked by 2 people

      6. JJ> Have you updated this?

        Yes-ish. The original summary stats I posted – which appear further down the comments thanks to the joys of replies/threading – were based on adding up novels and each author they had. (Which seemed to make sense for the original use-case I wrote this code for, but on reflection, maybe not…)

        That said, I *think* all the year-by-year breakdowns I’ve posted to Pastebin and linked from here are based on the current code, which explicitly separates the number of (new) novels and the number of distinct authors though.

        JJ> Would you be willing to run your algorithm for Orbit?

        The code I have splits up Orbit UK and Orbit US (with all the variant names I could identify), and it’s a bit late here to start faffing around with that, but the stats I could quickly & easily generate for them as separate entities are here: https://pastebin.com/te89HqSt

        (Looking at the publisher names that were pulled out in the first two parts of that text, I think all of the weird variants of Orbit had dropped out of use by 2020.)

        Like

      7. Thanks! The reason I wanted them run as one publisher is that there will be a lot of dupes across Orbit US and Orbit UK.

        Like

      8. JJ> The reason I wanted them run as one publisher is that there will be a lot of dupes across Orbit US and Orbit UK.

        I dunno, give some people an inch, and they take a mile… anyway, here you go: https://pastebin.com/DDkYsVdS

        One minor clarification: for the list of 2020 titles in the first part of that file, if the book was published by both US and UK arms, the one which is shown is chosen arbitrarily.

        Like

      9. 😀   Thanks for your forbearance!

        (If it’s any consolation, my parents spent the better part of two decades putting up with my “Why? But why???” search for answers.)

        It’s interesting that I had this perception of Orbit having always been a powerhouse of first publications, but the reality is that they’ve built themselves up significantly since 20 years ago.

        Of course, it has never occurred to me to wonder whether all of Gollancz’ works are reprints. Never. (13, 10221) 😉

        Like

      10. JJ> Of course, it has never occurred to me to wonder whether all of Gollancz’ works are reprints. Never. (13, 10221)

        Stats (with extra info for highlighting “classic” titles originally published more than 5 years earlier, basically to avoid confusing with tps coming out a year after hcs): https://pastebin.com/UTcVrMTd

        I believe the perception you have might be down to Gollancz having a fairly active reprint lineup, and (IIRC) the SF Masterworks line having a policy of never letting anything go out of print (except Moorcock’s Drowned World, which I think was a rights issue). A few – 2-3? – years ago it seemed that the SF Masterworks line got reorg’ed into the Gateway sibling/child imprint, which is why the last few years show fewer “classics” than prior years in the stats I uploaded.

        I also suspect that ISFDB contributors might be more inclined to record 2nd, 3rd, etc printings of these titles, compared to more recent titles, which would distort things.

        Like

      11. Thank you for humoring my outrageous requests! 😀

        Before asking, I went through the Orion listings to see if they should be combined with Gollancz — but the Orion listings of the last 20 years all appear to be reprints. In recent years, the SF Masterworks are all being released by “Gollancz/Orion“. (I hadn’t realized it, but apparently they stopped doing the Fantasy Masterworks line in 2016.)

        It’s interesting to note that Gollancz’ publication of original works has been gradually ramping up since 2000 at a rate only slightly less than Orbit’s.

        Taking all of the publisher numbers you’ve provided into account, I would say it indicates that the SFF publishing industry is healthy and has actually been growing (while simultaneously being consolidated under the Big 5).

        Like

      12. Okay, one last outrageous request.

        Would it be possible to do Tor.com (53666, 73318) for type “novel” OR “chap”? That would include the novellas but should leave out the original online shorter fiction (which is getting labeled type “mag”).

        (I recognize that this report would be inconsistent with the other publisher reports, in which some authors of prior Tor.com novellas would have been considered “debut”, since novellas weren’t being included.)

        If that would take too much finagling to code, no problem. You’ve already done so much! 🙂

        Like

      13. @JJ

        Rather than clog up these comments with stuff that might not be of interest to others, can you maybe send me an email, and I should be able to provide at least some of what you’re after? NB: by all means use some throwaway disposable Gmail account or similar, rather than share your real email with some random weirdo 😉 You can reach me via {anything}@e_____c______.com or the Twitter handle @ErsatzCulture.

        (I also have some theories about why certain publishers/imprints are the way they are, based on Kremlinogy-style interpretation of publicly known events/statements, but I’m a bit reluctant to post those on a public forum…)

        Like

      14. Honestly, it was just curiosity, because Tor.com is separate from Tor Books, and while Tor.com has become a novella powerhouse, they actually publish quite a few novels as well (11 in 2020, 6 in 2019, 12 in 2018).

        But I am very curious about your theories about why certain publishers/imprints are the way they are, and have sent you an e-mail. 😀

        Like

    3. Thanks John S! I bow to your mad scripting skillz.

      It does look like our hunches of DAW being numerically equivalent to Baen were about right. So I’m interested in the Saga stats as well.

      But just by the law of averages, Tor ought to have more nominations than Baen, since they publish so many more works, right?

      Like

  1. Having looked at publisher related stuff based on ISFDB data, I could probably write a long essay about this… but rather than bore the pants off everyone, some brief (?) thoughts:

    * You might want to consider splitting up Tor and Tor.com – per this interview with their (IIRC) senior editor, organizationally they are different entities under the wider “Tom Doherty Associates” group: https://sffdirect.com/report-science-fiction-convention-ytterbium-eastercon-2019#fifth
    * In fact, I suspect that the Tor.com website that publishes novelettes and short fiction is a different entity from what now calls itself Tordotcom Publishing, which is the novella and novel publisher. (The former IIRC are generally tagged as “Tor” in ISFDB, which I believe is incorrect/misleading, but I haven’t dug in deep enough to argue the case that that needs changing.)
    * There are also multiple publisher entries in ISFDB for different parts of Tor and Baen. This probably won’t affect award finalists much, but is a factor if you’re looking at all their output. Here’s a manually curated config I put together to try to catch them all: https://github.com/JohnSmithDev/ISFDB-Tools/blob/master/publisher_variants.py

    Liked by 2 people

  2. “ISFDB lists “10038 publications not in a publication series” for Tor.” I wonder what the 10,000th one was?

    (I was going to do a piece on Jim Baen Presents but maybe I will hold off on that)

    Like

  3. My questions would be.

    How much do reprints of work originally by other publishers affect the result? Baen is known for such reprints (it’s one of the good things they do).

    Tor’s successful novella line would skew results in that area in their favour. How big is that impact? (I wouldn’t be surprised if it matched or exceeded Baen’s hits).

    Tor has a bigger budget and better distribution – if the two were in competition for an author they’re more likely to go to Tor than Baen. How does that impact the result? Can we account for that factor by adding another publisher of similar clout to the comparison?

    What sort of difference should we expect? The books published probably have more to do with the results than the publisher (whether by editorial policy or ability to compete for authors). But it seems hard to account for that.

    Liked by 1 person

    1. I can’t speak directly to the number of reprints, but the – imperfect – tools I wrote to pull data out of ISFDB tell me that:

      * Tor (excluding Tor Teen, Tor.com, Tor UK, etc) published between 59 and 100 new novels a year between 2000 and 2020.
      * Baen published between 25 and 45 new novels a year between 2000 and 2020

      These figures will likely be higher than the actual ones, as they double/triple/etc count novels by more than one author, which I suspect will affect Baen more. (I could probably account for this, but these are the numbers I can easily lay my hands on right now.)

      If you want a “control” publisher, might I suggest Saga Press? They are owned by Baen’s distributor, but their line-up is closer to what people generally might associate with Tor or Tor.com.

      Liked by 1 person

  4. So what’s the actual percentage of nominees vs. percentage of total stuffs?

    Maybe a combo of your numbers and John S.?

    I like the idea of using Saga as a control for Tor. Maybe DAW as a control for Baen? (similar beginnings, etc)

    Since Baen has basically no distribution outside the US and I guess Canada, it’s really rather impressive that they did as well as they did at an award given by WORLDcon.

    Like

    1. As far as I can see, lack of international distribution seems to have minimal to no impact on getting nominated for Best Novel.

      I’ve mentioned before – possibly even in the comments on this blog – that we recently had a 3 year run where one of the finalists hadn’t had any sort of release outside the US by the time nominations closed. (Too Like the Lightning, Six Wakes and The Calculating Stars, for the record. Even now, Six Wakes is only available in the UK as a US imported paperback, or a self-pubbed ebook that apparently has lots of typos.) This is despite 2 of the 3 corresponding WorldCons taking place in Europe – I guess the way Hugo nominations work means that the members of the prior WorldCon are going to be at least as influential a bloc as the “actual” WorldCon, possibly minimizing the effect of local fans?

      Further back in time, I don’t think any of Robert Charles Wilson’s Hugo nominated – and in one case winning – novels ever got a proper UK release. I’ve seen them reviewed in places like the BSFA’s Vector magazine or SF Concatenation, but they seemed to be of imported copies, and whilst they’re now available as UK ebooks from Gollancz’ SF Gateway, I don’t think they’ve ever had UK physical releases.

      On a slightly different topic, but getting back to the wider Debarkle discussion: a (possibly unoriginal) theory I have – which I haven’t done enough research to yet feel confident in – is that one side-effect of the Puppy campaigns has been to make the Hugos more US-centric than they really should be in an era when there are more international WorldCons & (relatively) more accessible international travel than ever before.

      I recall from a table of members in the Dublin PRs that the top three “cohorts” of members were (1) US attending members, (2) UK attending members, and then (3) US supporting members. I’ve not looked into historical figures, but IIRC numbers of supporting members rose in response to the Puppy campaigns, and has stayed that way, even after the Pups gave up.

      If supporting members are more likely to nominate/vote – which isn’t a confirmed/confirmable fact, but more knowledgeable people than me seemed to think it plausible: https://twitter.com/nwbrux/status/1318626505392574464 – and they are disproportionately composed of North American members, then I don’t think we’ll ever see anything again like the 2005 Best Novel shortlist – when all 5 slots were taken by British authors – unless there’s a Chinese WC where local fans are encouraged get involved in nominations.

      Of course, if this theory is indeed true, then having (increased) US dominance over the Hugos would be a very Pyrrhic victory for the Puppies, given that the people actually nominating and voting are the “wrong” type of Americans from their POV.

      Liked by 3 people

      1. Amazon and other online booksellers have made books more accessible internationally, while the internet enables us to hear about potentially interesting books quicker than before. All of your examples were books that got a lot of buzz shortly after release, so interested parties, including Hugo voters, may have sought them out deliberately. And I even found both “Too Like the Lightning” and “The Calculating Stars” as import books on physical bookstore shelves here in Germany, likely because those books got a lot of buzz.

        Of course, because the US is so much bigger and most of the big SFF review sites are in the US, a US-only release is far more likely to get a lot of buzz than a UK only release.

        As for Baen, their books get comparatively little buzz outside their specific bubble and I rarely see reviews of Baen books on the big review sites, probably because a lot of what Baen publishes is book X in a lengthy series. And later books in lengthy series by other publishers don’t often get a lot of buzz either, simply because the fans of the series will buy them anyway and very few others will start at book 10 or so. Baen’s problem, if it even is one (after all, Baen does seem to do well enough), seems to be that their books appeal to a specific niche, but have little appeal to readers outside that niche.

        It also seems as if quite a few of the people who specifically joined Worldcon as supporting members to defeat the puppies stayed on and began nominating and voting even after the puppies took their ball and went home.

        As for “The Hugos are too dominated by Americans”, that has been a complaint in the UK for a long time now. It’s also something I’ve heard from German fans. “Oh, I never pay attention to the Hugos, it’s just American stuff I don’t know anyway.” Even German fans who have Worldcon memberships often don’t bother to vote and nominate.

        Liked by 1 person

      2. Which, of course, is self-defeating — if the Germans aren’t nominating, of course their preferred books aren’t going to end up on the ballot.

        Maybe Americans just really like to vote? It seems that way in general.

        And Puppies prove “be careful what you wish for, you might get it” — so many more SJW wrongfans from the US nominating all that girly, LGBTQI, BIPOC stuff (with decent plots and characterizations, and sentence structure, and proofreading).

        Like

  5. I suggest sort out novels…also less work for you. I think that’s where the complaining was. Perhaps I say that because that’s what I write (and read), and what Baen publishes. You might also sort out — may be way too much work — NYTimes or someone else’s best seller lists, because that measures market penetration — reaching more of the likely voters.

    Sometimes sharpening the numerical comparisons make a point clearer: Correcting for publication counts, Baen had 12*(Tor count)/Baen count = 32.05 corrected count of Baen works with nominations. Tor had 106 works, which is more than three times as many. Stopping the count of works when the puppy debate started might make sense.

    Liked by 1 person

    1. The NYT bestseller list is curated and easier to finagle (cf. the Scinos).

      I think the USA Today bestseller list was closer to actual numbers, and took in sales at big box stores, supermarkets, etc.

      But both those become less useful as ebooks become more popular, I think.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.