In the comments to the previous post on this topic, Johan P raised some really interesting points. I’d said rather glibly that the categories with more subscribers will obviously have more free-downloads and sales. As Johan points out this is counter-intuitive as the figures given are AVERAGES i.e. (I assume) the number of downloads/sales per book rather than the total number of downloads or sales in those categories. However, it really is true that the bigger categories have bigger downloads/sales but I haven’t explained it properly and I did use misleading terms like ‘crowded’.
The graph plots the totals of free-downloads + discounted book sales (horizontal axis) against number of subscribers. The relationship is quite strong. I plotted a line of best fit courtesy of Excel. Now a linear relationship is probably not the best way of describing the data. I assume that underneath all of this is some sort of power-law type thing going on with sales (i.e. some books sell HUGE amounts and shape the averages accordingly). How that all plays when comparing subscribers to sales would require more detailed data than we have. Even so, the line gives use something to compare the data we do have and an r-squared of 74% is enough to justify my claim that more subscribers=more downloads/sales as a broad statement.
Flipping this round, we get a different way of looking at the data: which genres deviate most from that line and in which direction? If I’m right and the sales figures are distorted by bestsellers, then a newbie author should stay clear from those genres ABOVE the line because these genres have more subscribers than we would predict from the number of downloads/sales. Genres below the line have more sales/downloads than we would predict from the number subscribers and that sounds like a better bet or at least those averages maybe closer to a ‘typical’ value rather than a distorted average.
Here’s a similar graph but this time looking at sales only and unfortunately done using Apple’s Numbers spreadsheet rather than Excel:
There are many ways we can quantify how much a data point deviates from that line but within the limits of the tools on this laptop, I’m just going to find the difference between the actual number of subscribers and the number predicted by the equation of the line. Negative is better here I think but I’ve sailed off into generating numbers whose meaning is unclear. I *think* that the genres near the top are less impacted by a few bestseller and the books near the bottom are more impacted but…I wouldn’t swear to that and I’m just guessing.
|Advice and How-To||-358,251|
|Politics and Current Events||-213,950|
|Dark Romance & Erotica||-186,103|
|American Historical Romance||-153,023|
|Time Travel Romance||-62,258|
|Religion and Spirituality||-55,485|
|African American Interest||104,971|
|New Adult Romance||209,591|
|Biographies and Memoirs||631,754|
|Action and Adventure||697,134|
|Teen and Young Adult||968,973|