Fun With Statistics

The new jobless data show that for every educational category of worker (college graduates, those with some college, high school grads and dropouts) the unemployment rate today is higher than it was at the peak of the 1982 recession. The same data show that the overall unemployment rate today is lower than it was in 1982. How can this be?

Try this example of an unnervingly common flaw that can arise when you reason from percentages alone. There are two treatments for kidney stones, Treatment A and Treatment B. Each treatment is tried out on two different types of kidney stones - small stones and large stones. Here are the results:

Treatment A   small stones - 93% effective    large stones - 73% effective

Treatment B   small stones - 87% effective   large stones - 69% effective

Which treatment do you think would be most effective overall among small and large stones? As it turns out:

Treatment B was effective 83% of the time for  either small or large stones

Treatment A was effective 78% of the time for either small or large stones

Huh? Here are the actual numbers:

Treatment A   small stones - 81 out of 87 effective       large stones 192 out of 263 effective

Treatment B   small stones - 234 out of 270 effective   large stones - 55 out of 80 effective

Thus the overall success rate for Treatment A is (81 + 192) / 350 = 78% whereas the overall success rate for Treatment B is (234 = 55) / 350 = 83%

This effect, where the results seem to switch between subcategories and overall rates is known as Simpson's Paradox. I don't think it's as much a paradox as it is a problem that arises out of an all too common problem people, including lots of expert witnesses, have with percentages - specifically, thinking of percentages as something independent of the data from which they were generated. The result of this flawed thinking is often a classic, but sometimes hard to perceive, apples to oranges comparison failure.

Here's a good discussion of the issue as it relates to the unemployment conundrum at The Wall Street Journal. For further discussion, including a take on why comparing unemployment among education categories over time is even dicier than comparing different treatments for different types of kidney stones there's another good write up at Andrew Gelman's Statistical Modeling, Causal Inference, and Social Science blog.

 

Tags:
Trackbacks (0) Links to blogs that reference this article Trackback URL
Comments (0) Read through and enter the discussion with the form at the end
Send To A Friend Use this form to send this entry to a friend via email.