Empirical Legal Silliness
Running sophisticated statistics software over a pile of data drawn from the opinions of an appellate court does not guarantee a sophisticated analysis of the court's decision-making. In fact, given what I've read in a couple of fairly recent law review articles, about the only thing it guarantees are unwarranted conclusions and no analysis whatsoever.
I had high hopes for the testing of hypotheses about legal reasoning. Take for example the work of Jeffrey Rachlinski. He was able to demonstrate that judges have some of the same cognitive blind spots as the rest of us; the most notable of which being hindsight bias. (See e.g. "Inside The Judicial Mind"). But this new fad of indiscriminately dredging for correlations and publishing uncritically the inferences generated thereby is very troublesome.
Let's take a suggestion from one law review article that a certain court is biased because the win/loss ratio for plaintiffs and defendants deviates significantly from what would be expected of an unbiased court. Specifically, that the defense prevailed in more than 68% of the cases tallied implying to the author that something significant was afoot since the observed distribution was more than one standard deviation beyond the expected. Upon what premise must such a conclusion rest? Either that justice is or ought to be decided by casting lots (in other words that the distribution of wins and losses should look like the results of flipping a coin a large number of times and so appear as a Gaussian distribution) or that cases appear randomly before the court. The former denies the existence of justice as we know it and the latter denies the facts - plaintiffs often press novel claims, seek to overturn or plead around statutes (like tort reform) or try to hold on to huge verdicts - all of which might reasonably be expected to influence the likelihood of success on appeal. Nevertheless, none of these assumptions or possible biases and confounders are discussed anywhere in the paper.
Another recent article demonstrates the misuse and abuse of statistics. In it the author claimed to have uncovered a power law behind certain judicial decision-making. Finding a hidden power law is considered very sexy these days. They're different than merely normal distributions; they explain an increasing number of natural phenomena (e.g. most prominently some economic conditions in the current troubles - see: Pareto distribution); and, best of all, they allow you to torture your data with even more impressive-sounding statistical tools.
The problem however is that putting fancy tools in the hands of someone who doesn't know how to use them cannot lead to anything worthwhile. And so it was with the second paper that had me despairing. On a log-log plot a power law will reveal its distribution as a straight line. But while a straight line on a log-log plot may be necessary for the presence of a power law it's not sufficient; yet that's stated no where in the paper. But that's a minor quibble compared to the real problem with the analysis. Rather than plotting the data on a scatter graph and inferring that a power law might lie beneath, the author apparently put the data on a log-log plot, forgot it was a log-log plot and ran a linear regression on it thereby forcing a nice straight line where there was none - the result being a sort of tautology-by-statistics.
Don't get me wrong. The discovery that certain behaviors could be explained and predicted mathematically would be a big deal. But when you see a vigorous debate in the scientific literature about whether bacterial foraging behavior (swim straight vs. tumble and then swim somewhere else) should be described by a power law curve rather than Gaussian bell curve it makes you wonder whether something as complicated as the administration of justice can ever be reduced to a simple formula. Human scale biological systems are fantastically complex and I'd expect that the approach a judge takes to her task of doing justice, the application of principles - our way of dealing with complexity and uncertainty, would be even moreso.