Education and science ©1994-2003 Kevin Boone
Home     Section index     K-Zone home
Workshop on experimental design

Site search

Articles
- So you want to be a university lecturer? Read this first!

What is statistics actually for?

More...

The K-Zone
K-Zone computing

K-Zone law

K-Zone education and science

K-Zone motorcycles

K-Zone DIY

K-Zone railways

K-Zone martial arts

About the author

K-Zone home page

 
Intergroup variability
contents
Comparison in the case study
Experimental design: intergroup variability example

Intergroup variability example

Suppose I want to determine whether my new Java compiler, for example, produces programs that execute more quickly than earlier compilers. This seems very straightforward: compile the same program using both compilers, then execute both versions while timing them. This seems all there is to it, but it isn't. There are two problems.
  • It tells us nothing about the performance on other programs than the test program. This is our old friend generalization again. A better approach might be to run a number of test programs with different functions, and add together their execution times
  • In practice, the same program run on the same computer will not always take the same time to execute. This is because modern operating systems are multitasking, and it is difficult to predict what else the computer might be doing while running my test programs.
As an illustration, suppose I run my set of test programs with the two compilers, and get times of 10 seconds and 20 seconds for the two systems. Then I run the first program three times and get timings of 10.1, 10.2 and 9.7 seconds. It seems very clear that the two compilers have very different performances.

But suppose that we are looking at small refinements to a compiler, perhaps giving a speed improvement of about five percent. Now the within-group variability is quite similar to the intergroup variability. It is very difficult to determine whether one compiler is really faster than the other.

In order to get back to the situation where the intergroup variability is greater than the within-group variability, we could proceed in one or both of these ways.

  • Select a different test set of test programs that are likely to enhance the differences between the two compilers
  • Reduce within-group variability by increasing the number of repetitions of the tests. This is an example of increasing the sample size.
In experiments involving human beings, a large source of within-group variability is the natural variability in absolutely everything between different human subjects. If your test groups comprise people of different age, gender, ethnic background, etc., then you can expect a great deal of variability in everything you measure. On the other hand, if you don't have a mixture of these properties, you can expect bias instead.

The textbook solution to this problem is to make comparisons by testing two different things on the same groups of human subjects. This largely eliminates inter-subject variability, but since people can't do two things at the same time, we can introduce a time bias instead. The textbook solution to this problem is to use two groups of people, who repeat the tests in different orders. This is called a crossover experiment. Crossover experiments have been the standard way of conducting comparison experiments using human subjects for about 50 years. However, a source of error called the `carryover effect' has led to a re-evaluation of the usefulness of this technique in the last few years. These are complex and subtle issues, and anyone considering carrying out a substantial crossover experiment needs to consult a competent statistician (or become one!)