The K-Zone: What is statistics actually for?
For most of my working life I have been an experimental scientist or an engineer
of some sort. In this capacity I got used to dealing with statistical
methods and, over time, developed a feel for what you could do with statistics.
I always assumed that other people involved in similar work had the same
view of statistics that I did: statistical analysis was a tool that could
be used to find out interesting and important things. Over the years, I've
come to see that I was wrong. A huge number of people either (a) have no
idea what statistics is about at all, or (b) assume that it's a set of
mechanical procedures that you have to apply to see if your data makes sense.
This, I believe, is not at all what it's all about.
|
``The reason I can't help readers with their statistics is mostly
because they haven't grasped the idea that statistics is for something''
|
|
|
Now, there are a few articles on my Web site about statistics and
experimental design, and sometimes
people read these articles, and sometimes they
believe -- often wrongly -- that I can help them with their statistics
problems. I still get a reasonable volume of e-mail on the subject.
The reason I can't help readers with their statistics is mostly
because they haven't grasped the idea that statistics is for
something, that it has a purpose. I blame the editors of scientific
journals for this; they seem to think that nothing is publishable
without a ream of incomprehensible Kolmogrov-Smirnov test results,
or whatever. Whatever the reason, the idea has grown up that
statistics is something you do if you have to, but otherwise
is to be avoided. This idea is closely allied with that other
trite platitude: `you can make statistics say anything you like'.
maybe you can make statistics say anything you like to some people,
but you can't make it say something it shouldn't to me.
Anyway, in this short article I want to explain what statistics is actually
for; I want to try to explain the single most important thing about
statistics. There's no maths, Greek symbols, or jargon, just -- I think --
common sense. It doesn't matter whether you know and care about
statistics, or not. If fact, you can spend years studying t-tests
and confidence intervals, and still not grasp the core philosophy
of the subject.
|
``The purpose of statistics is not just important to statisticians,
or scientists, it's important to everyone''
|
|
|
The purpose of statistics is not just important to statisticians,
or scientists, it's important to everyone. Even if you
will never use statistics in your life, and have no interest
at all in the subject, and can barely count to ten on your
fingers, then -- in my humble opinion -- you still need to
understand what follows. If I can get this message across, then
even if I achieve nothing else in my life -- and, let's facing, that's
looking increasingly likely -- I will still have left the world a
slightly better place than I found it.
In short, statistics is important because we can use it to find out whether
something we observe can be applied to new and different situations.
Knowing this allows us to plan for the future, and to make decisions
about how to allocate our scarce resources of money, energy, and
ultimately life. In statistics we use the term `generalisable':
an observation is generalisable if it can be used to predict what
will happen in new and different situations. If it is not
generalisable, it can't. So what is statistics for? It's for
determining whether an observation is generalisable or not.
It's as simple as that.
OK, so it doesn't sound all that earth-shattering, does it?
Let's try to illustrate it with an example. Some time ago I overhead
a conversion in a pub. I can't remember the exact words, or the names
of the people involved. The conversation was about smoking and its effect
on your health. As near as I can remember, one chap (let's call him Bob)
said something like this:
``I think it's all nonsense, that smoking kills you. It's not as
bad as they say. I mean, look at my family: my dad had four brothers, and
they all used to smoke four packs of gaspers a day. And the youngest
of them is now eighty. Now, my friend John, he's just had to have a lung
removed, because it was full of cancer. He's never smoked in his life.
It just goes to show, doesn't it? All those doctors haven't got a clue.''
Now, what's wrong with Bob's statement? If you have never been exposed to
statistics or experimental science, maybe you're thinking: `there's
something in what he says -- I have an uncle/grandmother/sister that
smokes like a chimney and is as fit as a fiddle at age ninety.' If you are,
or have been, a statistics user, perhaps at university or in your work,
perhaps you're thinking: `he hasn't defined a null hypothesis, and in
any event how am I going to work out a p-value from that?'
Both of these standpoints, while perhaps correct, miss the point entirely.
Now, before you jump to any conclusions, I'm not anti-smoking.
I don't care whether you smoke or not; why should I? That's not the point
I am making. The point is this: if you choose to smoke, you need to make
you own mind up about the consequences. And the worst way you can
do that is to listen to someone like our fried Bob. Why? Because his
observation does not generalise.
|
``But, in the final analysis, what
we really care about is our family, our friends, and
ourselves''
|
|
|
The sorry fact is that we don't care all that much about Bob and
his family and friends, unless we are personally acquainted with them.
Sure, we don't wish them any harm; we wouldn't drive past their house
without stopping if it were on fire. We wouldn't steal their last
penny to prop up a wobbly table. But, in the final analysis, what
we really care about is our family, our friends, and
ourselves. So the most important question you need to ask in
relation to Bob's statement is this: ``how does this affect me and mine?''
It's as simple as that.
The problem is that it is
impossible to answer that question. Nothing that Bob has observed
has any bearing on you and yours. Here is why it doesn't.
The group of people he has observed: his father and uncles, and his fried
John, are not representative of you or I. Suppose you are black,
suffer from asthma, have a family history of heart disease, and are
generally fit. Bob's family is white, have no history of heart disease, and
lead sedentary lives. John is Indian, and is a mountaineering instructor.
What reason do you have for thinking that Bob's
observations apply to you? Maybe the differences between Bob's family and
yours are significant, when it comes to the effects of smoking. Perhaps
they aren't. We just don't know.
A properly-designed experiment, with sensible statistical treatment, would
allow us to tease out these different factors, and determine which
are relevant and which are not. Alternatively, we could include people of
different ethnicity, gender, age, etc., so we get a more realistic view of
the population as a whole. Then we would be getting towards a result
that generalises. Then we would be able to judge, with some
confidence, whether we ought to take the risk of smoking or not.
Without doing this, you may as well plan for the future using a
ouija board or a weather vane.
|
``statistics isn't about numbers, it's about life''
|
|
|
So that's what statistics is for: it's about looking at the world, and
things that happen in it, and figuring out whether these things are
going to affect us. Ultimately statistics isn't about numbers, or
t-tests, or sampling theory, it's about life.
Right, rant over. Thanks for listening.
©1994-2006 Kevin Boone, all rights reserved