Education and science ©1994-2003 Kevin Boone
Home     Section index     K-Zone home
Workshop on experimental design

Site search

Articles
- So you want to be a university lecturer? Read this first!

What is statistics actually for?

More...

The K-Zone
K-Zone computing

K-Zone law

K-Zone education and science

K-Zone motorcycles

K-Zone DIY

K-Zone railways

K-Zone martial arts

About the author

K-Zone home page

 
Comparison in the case study
contents
What question?
Experimental design: comments on comparison

Comparability issues in the Bodgett and Scarper experiment

In the experiment conducted by Bodgett and Scarper, the end-users of their software were asked to assign ratings to the system. They used a numerical scale in an attempt to give some credibility to the results (we can average '9' and '10' but we can't average 'good' and 'excellent'). The main problem, however, is that it is very difficult to interpret the numerical results obtained. What does an `ease of use' rating of `8 out of 10' mean? We have no way of judging what rating would have been obtained by a comparable system (if there is one). The experimenters can reduce this criticism to a degree by setting questions that explicitly ask for a comparison against other systems of which the experimental subjects may be aware. Even then, they run the risk that the subjects may not remember the other systems well enough to comment (this is a source of bias), or may give misleading answers in an (unconscious) attempt to please the experimenter.

What the experimenters could have done is to explictly require their subjects to carry out equivalent tasks using a number of different systems. If the subjects still rated the new system more highly than the others, then this result is more credible than the original one. Moreover, we may be able to assign a numerical value to the degree by which the two systems differ.

Because the experimental subjects are human, and therefore sources of unwanted variability, the experimenters should attempt to minimize this variability by comparing the impressions of the same subjects with the different systems at different times. In this case we have to be careful of `learning' effects (the subjects perform better over time because they improve at the task). There may be a good case for instituting a crossover experiment here.