Painting By Numbers

It's hard to argue with the decision of the U.S. Court of Appeals for the Seventh Circuit in Joann Schultz v. Akzo Nobel Paints, LLC; a benzene/AML (acute myelogenous leukemia) wrongful death claim filed by the wife of a painter. The opinion frames the question before the court as follows: Is the fact that plaintiff's oncology expert holds to the linear no-threshold model for carcinogens a sufficient reason under Daubert to exclude his opinion that plaintiff's estimated high benzene exposure was causative given published studies demonstrating an eleven-fold risk of AML among those similarly exposed? The obviously correct answer is "no" and so the court held. (Behold the power of framing!)

We haven't read any of the briefing and so don't know if the question as framed was really what the fight was all about (though we certainly hope there was more to it than just that) but we did read the entire opinion and noted a couple of things you might find of interest. One involves the fact that plaintiffs are finally getting why dose matters (and helps) another is the opportunity that arises when dose is calculated. The last is a lament about the common problem of conflating observational epi studies with the scientific method, casual causal inference via "differential diagnosis" and a pointer to a new tool you might find helpful.

Unlike most toxic tort plaintiffs Schultz had her expert perform a sophisticated dose reconstruction. From it the expert generated a cumulative dose estimate which was then compared to risk data drawn from epi studies plaintiff thought she could defend as scientifically sound. The comparison yielded a large increase in her deceased husband's risk of AML attributable to his benzene exposure. Now that's the way to do it. Better yet, it was essential to getting the summary judgment rendered against her reversed. How did she do it?

According to the appellate court Plaintiff reconstructed the decedent's benzene exposure "using Monte Carlo Analysis, a risk assessment model that accounts for variability and uncertainty in risk factors such as the likely variation in [decedent's] exposure to benzene during different periods and at different plants." The court then proceeded to write "[t]he U.S. Environmental Protection Agency (EPA) has endorsed this methodology as a reliable way to evaluate risk arising from environmental exposure." Sound good?

For the unwary it sounds as though Monte Carlo Analysis(/simulation/etc) is (a) some sort of mathematical equation that (b) generates reliable (and therefore presumed to be admissible) risk estimates while (c) accounting (somehow) for missing data, and it comes with (d) Uncle Sam's seal of approval. Unfortunately, it isn't so.

Rather than going into detail about Monte Carlo simulations and their promise and limitations let's briefly discuss what makes them invaluable when it comes to cross examining the other side's expert. It turns out that it's not some sort of mathematically deduced equation that "accounts for variability and uncertainty" in these Monte Carlo exercises - it's the expert who picks which variables matter along with the formula that expresses the pattern wherein the variables are expected to be found. In other words, the expert using a Monte Carlo method has translated her opinions into a mathematical language. So whereas an expert's deposition fought only in English typically yields dissembling about methods and sharp advocacy about results, a translation of the mathematical language of her model reveals her tactics and much of the other side's strategy. Words like "high" and "low" become e.g. 2 ppm and 0.01 ppm, "increasing" and "decreasing" become calculable slopes, and "it varied throughout his shift" becomes e.g. a power law - each suddenly vulnerable to an informed cross examination.

Yet while we cheered the use of (potentially) explicit and transparent models (we don't know if plaintiff's estimator divulged his spreadsheets and formulae) for estimating dose we groaned at an embarrassingly shabby standard for causal inference.

First there's the method: the differential diagnosis. Since plaintiff's expert ruled-in most of, or at least the important, "risk factors" for AML and then ruled-out everything besides benzene his opinion that benzene caused decedent's AML was deemed admissible. Ugh. Saying something is a risk factor is not the same thing as saying that it's a cause. That so-called risk factor epidemiology isn't science and isn't even likely to turn up real risks has been known for some time. Then there's the category error FAIL of ruling-in and -out of the set of causes things that aren't even causes. Next there's the "best of a bad lot" problem. If you don't have all the real causes "ruled-in" your diagnosis is iffy at best. If you don't have the most likely cause ruled-in then all you're likely doing is picking the least wrong cause. Since the cause of the vast majority of AML cases is unknown and as there was nothing to distinguish Schultz' AML from the thousands that arise spontaneously every year plaintiff's expert's failure to rule-out "whatever it is that causes 90% of all AMLs" should have been fatal to his differential diagnosis. (Note: it wouldn't however be fatal to plaintiff's case assuming the admissibility of her claim of an eleven-fold relative risk given decedent's estimated dose).

Second, there's the claim that because plaintiff's and defendant's epi studies share "the identical methodology - observing AML rates in populations exposed to benzene over time", "Rule 702 did not require, or even permit, the district court to choose between those two studies at the gatekeeping stage." The court would do well to read "The Scandal of Poor Epidemiological Research: Reporting guidelines are needed for observational epidemiology" before pronouncing such a rule. As one of the FDA EMDAC committee members said during the recent Avandia meeting, when it comes to evidence "observational studies are at the bottom of the totem pole." Courts should keep in mind the fact that small effects detected in such studies even though, and perhaps especially when, statistically significant (i.e. report a low p-value) are likely to be false - and that goes for the ones cited by defendants as well as plaintiffs.

If you're still harboring doubts about whether or not there really is an unfolding scandal involving observational epidemiology read the editor's choice, "Improving Observational Epidemiology" in the current edition of International Journal of Epidemiology. If you don't have free access to the entire paper this should encourage you to pay the price for it:

"The ability of observational datasets to generate spurious associations of non-genetic exposures and outcomes is extremely high, reflecting the correlated nature of many variables, and the temptation to publish such findings must rise as the P-values for the associations get smaller. The forces involved - the imperative to publish for a successful research career, the journal publishers' and editors' desire to publish material that gets cited to increase their profiles and the isolation of many epidemiologists working on small,  often non-contributory, studies - are strong. Perhaps epidemiology needs to re-define its training and knowledge base and build in subsequent accreditation routes to promote better standards of epidemiological professional practice. Very few epidemiology departments impose the discipline of laboratory daily log books in which every experiment and analysis is recorded to provide some verification of what was a priori and what was post hoc. Academics involved in 'handle-turning circular research', highlighted nearly a century ago by Paul de Kruif, and commented upon recently in this journal, really do need to find alternative pursuits."

Finally, if you're interested in uncovering abusive practices in observational epidemiology add p-hacking to your vocabulary (you know your meme has gone big time when it's on Urban Dictionary) and "P-Curve: A Key to the File Drawer" to your arsenal. A number of statisticians, alarmed at the realization that the tools of their trade have been used to gin up spurious science on an industrial scale, have developed new tools to detect it. Plaintiffs are turning such tools on drug studies and company-financed employee mortality studies. Meanwhile, in one case of which we're aware, defendants are using them with effect on a whole series of studies resting on nothing more than the curious coincidence that the reported p-values all fell between 0.044 and 0.05. Go figure. Literally.

Trackbacks (0) Links to blogs that reference this article Trackback URL
Comments (0) Read through and enter the discussion with the form at the end
Send To A Friend Use this form to send this entry to a friend via email.