Tuesday, July 25, 2017

- Statistics Explained Badly

I'm not a professional writer, and will often 'cop' to my amateurish style and skill. But this is so bad I feel like I have to make mention of it.

I'm reading Slate this morning, and there is a story critical of Mayor DiBlasio (OK... right with you there liberals) on his policy for reducing New York City's rat population. (You've still got me Slate.) But with the very first sentence, I knew there was going to be a problem:

New York City is notorious for its large rat population, and Mayor Bill de Blasio is eager to do something about it.

Maybe I'm being a little bit too literal. But are we having a problem with large rats, or their large population? I'm in favor of reducing both, and since the largest rat I've ever seen in my life (I kid you not... it was the size of a small dog - like a Nutria on steroids) was in Washington Square Park just two blocks from where I type this, I can get behind either plan. But you'd think the pros over at Slate could be a bit clearer. Maybe both the author and editor were stoned at the time and didn't much care which interpretation I as a reader embrace.

The piece goes on to describe some mathematical sleight of hand the administration uses to make themselves look better than they truly are. Then a few paragraphs in, I came to this:

Regression here refers to the statistical phenomenon that exceptional events are usually just that: exceptions, not the norm. For example, the genes inherited from tall parents generally produce shorter children. The student with the highest score on one test is unlikely to do as well on the next. And locations with the highest crime tend to exhibit lower crime the following year.

If you want to mis-speak regarding rats, I think I can let it pass. But math requires precision. It is a world where being correct is categorical. 2+2 isn't 'about 5', it's 4. Every time. And the rhetorical liberties taken by the stoner writer have no place in it. So let me offer some corrections.

First, he isn't talking about regression, he's talking about mean reversion, which is the tendency of a numeric series (over time) to deviate back toward the average immediately after an unusual deviation from it. The greater the initial deviation, the more likely the reversion. But the examples offered distort the facts a little.

The child of two exceptionally tall parents is likely to have a child who is shorter than they are, but still taller than average. And their grandchildren are likely to be taller than average as well. The Dutch are taller than average. The Vietnamese are shorter than average. Those trends persists over time in spite of what we call 'local variation'.

What's more, it only applies for 'random' events. A child who gets the best score on a test is very much NOT random. In my daughter's class the same child (mine) gets the best score on very nearly every test, in nearly every class, and has done so in every grade. Her spectacular work ethic, study habits, and innate intelligence see to that. And in those rare occurrences when she doesn't do so, it's always the same two or three other kids who do. No one is 'smart today' but stupid tomorrow.

As for crime rates, it's only those places which see a radical increase in YoY (Year over Year) crime rates that are likely to see a modest decrease the following year, but they will almost certainly be higher in crime than the surrounding areas. This is very much NOT a random occurrence either. Young black men commit most of the violent crime in New York and young minority men commit virtually all of the 'shootings' in the city. The year on year variation may be noticeable, but the overall trend doesn't change in any meaningful way unless you do something about it.

I don't mean to be too hard on the stoners at Slate. It's very much true that the Diblas-inistas are awful at using math tricks to make themselves look good, and I congratulate Slate for being able to detect and report on that. That's certainly a vast improvement over their past efforts, and unless I'm mistaken they didn't mention racism or misogyny once in the whole article. No really! It's amazing right?

But I see no reason why Slate should end their slow march toward realism before learning to speak in factually correct terms. Yes, reality is boring compared to the land liberals live in, And they may have to 'spice things up a bit' to keep their usually hysterical readers engaged. But talking about Statistics is no place for that. If you're gonna say, you should say it correctly.

No comments: