Category: Statistics

Claims and false claims

[A content warning: this post discusses sexual assault reports.]

All reports of a crime have potential consequences. We live in an age where false reports of crimes lead to death and where “SWATting” is a murderous prank. However, only one class of crime leads to constant concern from conservatives that false allegations are sufficiently common to require a kind of blanket scepticism. Amid the allegations against Supreme Court nominee Brett Kavanaugh, conservatives are pushing back against treating allegations of sexual assault at face value. This is part of a long history of people demanding that sexual assault crimes, in particular, require additional scepticism and scrutiny. That history pushed an idea that rape claims are made by women to ruin a man’s reputation even though historically the consequences of speaking out have always fallen more heavily on women than men*.

A piece by David French at the conservative magazine National Review attempts to pushback against modern feminist advocacy for supporting victims of sexual violence:

“It happens every single time there’s a public debate about sex crimes. Advocates for women introduce, in addition to the actual evidence in the case, an additional bit of  “data” that bolsters each and every claim of sexual assault. You see, “studies” show that women rarely file false rape claims. According to many activists, when a woman makes a claim of sexual assault, there is an empirically high probability that she’s telling the truth. In other words, the very existence of the claim is evidence of the truth of the claim.”

The tactic here is one we’ve seen in multiple circumstances where research runs counter to conservative beliefs. FUD, fear-uncertainty-doubt — everything from cigarettes to DDT to climate change has had the FUD treatment as intentional strategy to undermine research. Note the ‘how ridiculous’ tone of ‘In other words, the very existence of the claim is evidence of the truth of the claim.’ when, yes the existence of somebody claiming a crime happened to them IS evidence that a crime happened to them. It is typically the first piece of evidence of a crime! It isn’t always conclusive evidence of a crime for multiple reasons but yes, mainfestly it is evidence. The rhetorical trick here is to take something that is actually commonplace (i.e. a default assumption that when a person makes a serious claim of a crime there is probably a crime) and make it sound spurious or unusual.

The thrust of the article rests on an attempt to debunk research that has been done on the issue of false rape allegations. To maintain the fear of men suffering from false rape allegations, the article aims to emphasise the uncertainty in the statistics to provoke doubt (and uncertainty) amid its target audience.

After a broad preamble, the article focuses on one study in particular and to the article’s credit it does actually link to the paper. The 2010 study in question is this one False Allegations of Sexual Assault: An Analysis of Ten Years of Reported Cases by David Lisak, Lori Gardinier, Sarah C. Nicksa and Ashley M. Cote. The specific study looks at reports of sexual assault to campus police at major US Northeastern university. However, the study also contains (as you might expect) a literature review of other studies conducted. What is notable about the studies listed is that they found frequencies of flase allegations were over reported. For example a 2005 UK Home Office study found:

“There is an over-estimation of the scale of false allegations by both police officers
and prosecutors which feeds into a culture of skepticism, leading to poor communi-
cation and loss of confidence between complainants and the police.”

The space were David French seeks to generate uncertainty around these studies is two-fold:

  1. That sexual assault and rape are inherently difficult topics to research because of the trauma of the crime and social stigma [both factors that actually point to false allegations being *less* likely than other crimes, of course…]
  2. That there are a large numbers of initial reports of sexual assault were an investigation does not proceed.

That large numbers of rape and sexual assault reports to police go univestigated may sound more like a scandal than a counter-argument to believing victims but this is a fertile space for the right to generate doubt.

French’s article correctly reports that:

“researchers classified as false only 5.9 percent of cases — but noted that 44.9 percent of cases where classified as “Case did not proceed.””

And goes on to say:

“There is absolutely no way to know how many of the claims in that broad category were actually true or likely false. We simply know that the relevant decision-makers did not deem them to be provably true. Yet there are legions of people who glide right past the realities of our legal system and instead consider every claim outside those rare total exonerations to be true. According to this view, the justice system fails everyone else.”

The rhetorical trick is to confuse absolute certainty (i.e. we don’t know exactly the proportion of the uninvestigated claims might be false) with reasonable inferences that can be drawn from everything else we know (i.e. it is very, very, unlikely to be most of them). We can be confident that cases that did not proceed BECAUSE the allegation was false (i.e. it was investigated and found to be false) were NOT included in the 44.9% of cases precicesly because those cases were counted in false allegation. More pertinently, linking back to the “fear” aspect of the FUD strategy, the 44.9% of cases also led to zero legal or formal consequences to alleged perpetrators.

I don’t know if this fallacy has a formal name but it is one I see over and over. I could call it “methodological false isolation of evidence” by which I mean the tendency to treat evidence for a hypothesis as seperate and with no capacity for multiple sources of evidence to cross-coroborate. If I may depart into anthropoegenic global warming for a moment, you can see the fallacy work like this:

  • The physics of carbon dioxide and the greenhouse effect imply that increased CO2 will lead to warming: countered by – ah yes, but we can’t know by how much and maybe it will be less than natural influences on climate and maybe the extra CO2 gets absorbed…
  • The temperature record shows warming consistent with the rises in anthopogencic greenhouse gases: countered by – ah yes, but maybe the warming is caused by something natural…

Rationally the the two pieces of evidence function together: correlation might not be causation but if you have causation AND correlation then, well that’s stronger evidence than the sum of its parts.

With these statistics we are not operating in a vacuum. They need to be read an understood along with the other data that we know. Heck, that idea is built into the genre of research papers and is exactly why literature reviews are included. Police report statistics are limited and do contain uncertainty and aren’t a window into some Platonic world of ideal truth BUT that does not mean we know nothing and can infer nothing. Not even remotely. What it means is we have context to examine the limitations of that data and consider where the bias is likely to lie i.e. is the police report data more likely to OVERestimate the rate of false allegations or UNDERestimate compared to the actual number of sexual assaults/rapes?

It’s not even a contest. Firstly as the 2010 report notes:

“It is notable that in general the greater the scrutiny applied to police classifica-
tions, the lower the rate of false reporting detected. Cumulatively, these findings con-
tradict the still widely promulgated stereotype that false rape allegations are a common occurrence.”

But the deeper issue is the basic bias in the data that depends on reports to the police.

“It is estimated that between 64% and 96% of victims do not report the crimes committed against them (Fisher et al., 2000; Perkins & Klaus, 1996), and a major reason for this is victims’ belief that his or her report will be met with suspicion or outright disbelief (Jordan, 2004).”

Most victims of sexual assault do not report the crime at all i.e. most victims aren’t even the data sets we are looking at. Assume for a moment that the lower bound of that figure (64%) is itself exaggerated (although why that would be the case I don’t know) and assume, to give David French an advantage, that 50% of actual sexual assaults go unreported and that half of the 44.9% figure were somehow actual FALSE allegations (again, very unlikely) that would make the proportion of false allegations compared with (actual assaults+false allegations) about 14% based on the 2010 study’s campus figure. It STILL, even with those overt biases included, points to false allegations being very unlikely.

It makes sense to believe. The assumption that rape in particular is likely to draw malicious allegations is a misogynistic assumption. That does not mean nobody has ever made a false claim of rape, it just means that we do not place the same burden of doubt on people when they claim to be robbed or mugged etc. People make mistakes and some people do sometimes maliciously accuse others of crimes but such behaviour is unusual and, if anything, it is particulalry unusual with sexual crimes where, in fact, the OPPOSITE is more likely to occur: the victim makes no allegation out of fear of the consequences and because of the trauma involved.

Somehow it is 2018 and we still have to say this.

*[I don’t want to ignore that men are also victims of sexual violence, perhaps at far greater rates than are currently quantified, but the specific issue here relates to a very gendered view of sex and sexual assault.]


This is evidence of something but of what, I’m not sure

The BBC’s short story contest has, after a selection process in which author gender wasn’t known, selected an all-female shortlist. It isn’t a contest that I’m familiar with but apparently, it has been running for 13 years and on four previous occasions the shortlist has been all women*.

It’s nice to see authors being celebrated this way and the ‘blind’ selection process undermines the likely claim from the intransigently anti-women section of society that the nominees were chosen based on ‘affirmative action’ or some anti-men sentiment. I say ‘undermines’ because, of course, it hasn’t stopped the usual misogynistic comments on social media.

As a positive story it is still interesting to look at in terms of how results from awards may depart from simple demographic splits. As I’ve discussed here before, quantifying the discrepancy between actual results and what those result might be if demographic profile was effectively random, versus understanding were that discrepancy comes from and whether it is a problem are two very different things. First things first though, is the shortlist of five women numerically remarkable?

The prize has been running since 2006 and by my count from Wikipedia five of the twelve winners have been women. According to the Guardian article, about 57% of submissions are from women.

  • In 2017 the shortlist of 5 included 2 women,
  • 2016 had 5 women,
  • 2015 had 2 women,
  • 2014 had 5 women,
  • 2013 had 5 women,
  • 2012 had a longer shortlist** with of 10 authors of which 6 were women,
  • 2011 had 3 women
  • 2010 had 3 women
  • 2009 had 5 women
  • 2008 had 3 women
  • 2007 had 2 women***
  • 2006 had 2 women on a shortlist of four

Looking at the list there are few obvious things: many of the same authors get nominated (which isn’t surprising) and just eyeballing the numbers suggests women are more likely to get nominated but it isn’t a trend as such. Of the 64 nominees, 43 were women, about 67% which is more than would be predicted if the distribution was 50-50 and is higher than expected using a 57%-43% split based on submission rates.

Interestingly these rates make the gender split on the winners also look oddly biased (in the statistical sense). Only five of the winners are women, 38% of the winners out of 67% of the nominees. I can’t find exact details of the judging process but I assume it is the same panel of judges in a given year who both shortlist and decide the finalist. There’s no simple model of personal gender bias that easily accounts for a jury that is gender biased in two directions 🙂

One difference is that the while the shortlisting is done ‘blind’ the selection of the winner is not. However, looking at the recurrence of names among the nominees, the role of ‘blind’ shortlisting can be overstated — notable authors have notable styles and even without names some stories are easily recognised (e.g. 2015 shortlisted nominee was Hilary Mantel’s comical tale “The Assassination of Margaret Thatcher“). It is also reasonable to speculate that judges are looking for qualities in the short stories that may be more common among women authors — not because gender determines how people write but because of the on-going mechanics of expectations on creative people in a gendered society. Perhaps that explains both sets of results?

Simple proximate causes are inadequate here.

*[The reporting was in terms of male versus female, so I don’t know what the figures would be if the authors were classified with a broader view of gender. Without full author bios, the counts here are based on gendered first names or pronouns used in associated stories.]

**[To coincide with the Olympics the competition was opened up to more nationalities.]

***[Four nominated works and five nominated authors as one story was by two authors “Slog’s Dad” by Margaret Drabble & Dave Almond]


Looking at Subscription Data

The discussion in the comments about Amazon ranks sent me off on a tangent. I gathered some Amazon rankings for SFF magazines that offer subscriptions via Amazon and having got that data I thought I should do something with it.

As I also had the 2017 Fireside Report data I thought I’d compare the two. Now, this data is not great. Firstly, while the Fireside Report is methodical it is necessarily less strong on a per-magazine basis than it is in aggregate — one author incorrectly identified (or not identified) would have a big impact on the proportion listed. Secondly, the Amazon rankings I’ve got don’t necessarily represent the size of the readership consistently between the magazines — there is some major variation in business model between the magazines listed.

Still, I was curious. Story outlets that maintain an ongoing Kindle subscription model would be (I speculated) the more established and hence ‘traditional’ and hence reflect the least amount of social/cultural change.

Given all that, it is not surprising that the data is really just a big bunch of all-over-the-place when comparing rankings. I did tabulate sub-rankings in particular categories but those rankings on their own terms appeared to make no sense and/or not quite commensurate classifications within Amazon.

No strong conclusions to draw other than:

  • there’s no obvious commercial downside for outlets that have better representation
  • overall (as noted in the Fireside report) the level of representation isn’t good
  • Uncanny’s model doesn’t suit the ranking very well.

The last two columns are from the Fireside Report 2017 Google spreadsheet

Magazine Amazon Kindle Subs Rank
total stories, black authors % stories by black authors
Fantasy & Science Fiction
























Nightmare Magazine








The Consonantal USA*

*[Pun courtesy of Mr Mike Glyer, purveyor of fine blogs, fanzines and assorted goods]

There are many things we can say about the states that comprise the United States of America. “Why?” might be one of them but another might be “which state has the highest proportion of consonants in its name?” If that is your question then behold! The answer is below.

    • 25.0% consonants=1 length=4 | Iowa
    • 25.0% consonants=1 length=4 | Ohio
    • 33.3% consonants=2 length=6 | Hawaii
    • 33.3% consonants=3 length=9 | Louisiana
    • 40.0% consonants=2 length=5 | Maine
    • 40.0% consonants=2 length=5 | Idaho
    • 42.9% consonants=3 length=7 | Alabama
    • 42.9% consonants=3 length=7 | Arizona
    • 42.9% consonants=3 length=7 | Georgia
    • 42.9% consonants=3 length=7 | Indiana
    • 50.0% consonants=4 length=8 | Missouri
    • 50.0% consonants=4 length=8 | Oklahoma
    • 50.0% consonants=3 length=6 | Alaska
    • 50.0% consonants=5 length=10 | California
    • 50.0% consonants=4 length=8 | Colorado
    • 50.0% consonants=4 length=8 | Delaware
    • 50.0% consonants=4 length=8 | Illinois
    • 50.0% consonants=3 length=6 | Nevada
    • 50.0% consonants=3 length=6 | Oregon
    • 50.0% consonants=2 length=4 | Utah
    • 50.0% consonants=4 length=8 | Virginia
    • 53.8% consonants=7 length=13 | South Carolina
    • 54.5% consonants=6 length=11 | South Dakota
    • 55.6% consonants=5 length=9 | Minnesota
    • 55.6% consonants=5 length=9 | New Mexico
    • 55.6% consonants=5 length=9 | NewJersey
    • 55.6% consonants=5 length=9 | Tennessee
    • 57.1% consonants=4 length=7 | Montana
    • 57.1% consonants=4 length=7 | Wyoming
    • 57.1% consonants=4 length=7 | Florida
    • 57.1% consonants=4 length=7 | New York
    • 58.3% consonants=7 length=12 | Pennsylvania
    • 58.3% consonants=7 length=12 | West Virginia
    • 60.0% consonants=3 length=5 | Texas
    • 61.5% consonants=8 length=13 | North Carolina
    • 62.5% consonants=5 length=8 | Maryland
    • 62.5% consonants=5 length=8 | Michigan
    • 62.5% consonants=5 length=8 | Arkansas
    • 62.5% consonants=5 length=8 | Kentucky
    • 62.5% consonants=5 length=8 | Nebraska
    • 63.6% consonants=7 length=11 | Mississippi
    • 63.6% consonants=7 length=11 | Connecticut
    • 63.6% consonants=7 length=11 | North Dakota
    • 63.6% consonants=7 length=11 | Rhode Island
    • 66.7% consonants=8 length=12 | New Hampshire
    • 66.7% consonants=4 length=6 | Kansas
    • 66.7% consonants=6 length=9 | Wisconsin
    • 69.2% consonants=9 length=13 | Massachusetts
    • 70.0% consonants=7 length=10 | Washington
    • 71.4% consonants=5 length=7 | Vermont


Today’s Important Charts

Star Wars movies title lengths by year:


The long period of consensus on proper Star Wars movie title length has ended with a sharp decline.

Neither Caravan of Courage: An Ewok Adventure nor Ewoks: The Battle for Endor were included in the first graph as they were TV movies. However, including them implies the recent decline is a natural correction to mid-1980’s excesses.


Most importantly (save the strongest results till last) including all the movies and comparing title length to running time produces this important result:


That’s an R-squared that’s not to be sneezed at!* 40% of the variance is explained by title length. According to the Felapton Towers research scientists the mathematical model is:

running time = 151.42 – 1.2979×title length

This is excellent news for when they produce the film entitled “R2”, chronicle his life and career as a Sith Lord. the model predicts a running time of 148.8242 minutes, which is shorter than The Last Jedi (a bit of an outlier). Whereas The Life, Loves and Wacky Adventures of Galactic Senator Jar-Jar Binks and All His Fun Friends will mercifully be only half an hour long.

*[Moral: be careful looking at correlations]

Reviews and discourse

This is thinking out loud and a bit long and waffly in a field beyond my expertise. Approach with skepticism 🙂

Reviews and literary criticism are not one and the same thing. Criticism (in its literary sense) is a tool of the reviewer and at the same time, reviews can be seen as a subset of criticism. In its wider sense criticism is intended to examine text and shed light on what those texts do. Reviews are more overtly functional and are often characterised as being there to inform potential consumers.

Of course we should be first of all suspicious of this distinction and also the description of both reviews and criticism. I often write things entitled as ‘reviews’ not to inform potential readers/viewers but to share my feeling and experiences as a consumer of that media. In particular, reviews with spoilers or which really require the reader to have experienced the book/movies/tv-show are not written with potential consumers in mind but rather people who have already consumed the media. Much of reviewing that is available online is better described as ‘commentary’ – more akin to post-games discussion of sport matches or analysis of news stories. It may be less high-brow than what would be recognised as literary criticism (and less informed by models of literary criticism) but it is closer in kind to it than ‘reviews’ in the sense of information for potential consumers.

Yet another role for reviews and criticism is improvement, change or the establishment of norms. Editorial reviews (at a high-level – I don’t mean proofreading) are one example but it is something that can be seen in literary criticism and in more general reviewing. Identifying problems in texts or discussing whether a text fits within a genre form parts of wider discussion about what counts as being a problem in a text or what defines a genre.

To clarify for the purpose of discussion I’ll split things into various roles:

  • Reviews for the purpose of informing potential consumers (not unlike product reviews of goods).
  • Post-consumption sharing of experiences.
  • Criticism for the purpose of understanding a text.
  • Reviews/criticism for identifying problems or potential improvements in a text.

Each of these form part of the wider discourse within a community that has a shared engagement with texts. As this is a broad, multi-faceted and diverse discourse that ranges across multiple venues, there are no hard borders between those roles. A review ostensibly for helping people what to read next may incorporate particular norms about the genre because having norms provide a way of judging and reporting on texts. Likewise analysis of what is going on in a story for its own sake can encourage somebody to read a story (or ensure they stay well away from it forever!)

Identifying problems with how race, ethnicity, nationality, religious belief, gender, sexuality or disability are represented in a text would fall into the fourth point on my list but also fits with the first point on my list and will often be derived from the second point on my list. Lastly, a discussion of such issues may revolve around the third point on my list I.e. what is actually going on in a text and what kind of representation is being used.

I’m well out of my philosophical comfort zone by this point having strayed out of the analytical and insular and into the phenomenological and continental but let’s persist.

Common to all is the sharing of personal experiences with others. Put another way, reviews and criticism bridge subjective experiences to intersubjective community understanding.

How we experience a text (story, film etc) is something we can examine and discuss and it is something that we can analyse, something we can find patterns with and it is also something where we can aggregate data. Aggregation and quantification of subjective data does provide a way of looking at subjective experiences using tools designed for “objective” data (and see the previous essay for what I mean about “objective”). It is another way of looking at shared experience.

Related to that is the role of anthologies and magazines. Both are traditionally and important part of this wider discourse about the nature of science fiction and fantasy. By collecting stories together and present a set of stories as examples of the genre and as examples that are of at least some minimum quality, anthologies are also a form of both review and criticism that wittingly or not describes possible boundaries of the genre. To call an anthology a work of ‘criticism’ may sound odd but it covers many aspects of the points above (e.g. a ‘best of the year’ style anthology is normative by presenting examples of some standard of ‘best’ and also a way of guiding consumers to stories they may like and also represent some of the subjective reaction of the editors/compilers).

Similar points can be made about awards and competitions and here the overt nature of a discourse becomes clearer. With fan awards discussion and shared experience is an important part of the process. Juried awards can also engender debate and discussion and I’d argue the most interesting ones are the ones that evolved this aspect (for example the Clarkes by virtue of the Shadow Clarkes have become a more interesting award).

Where are you going with this Camestros! I hear you shout (assuming you’ve read this far). OK, OK, I’ll stop waffling.

My point is, if we are to discuss what reviewing should be like and what kinds of reviews and reviewing activity people should be doing, we have to consider it against this wider discourse. It is the big broad discussion that is the important thing – along side the health and welfare of individuals.