|
©1994-2007 Kevin Boone | ||||||||||||||||||||||
|
Home > Education > Education and science articles
How to conduct a survey
Last modified: Thu Jul 8 11:47:11 2004
Kevin Boone, School of Computing Science, Middlesex University
Summary
Contents
Summary
Contents
1 Introduction
2 Questions and questionnaires
3 Sampling and the population
4 Certainty and confidence
5 Bias
6 Presentation of results
7 General guidelines
1 Introduction
This article describes some of the issues that the experimenter must confront when designing a small scale survey. By `small scale' I mean one that can be carried out by a single person in a few weeks. Students very often underestimate how difficult is is to carry out a survey well; a good survey is more than a handful of questionnaires and a couple of bar charts: it requires careful planning, methodical application, and detailed analysis of the results. In most surveys some statistical analysis will be required. Perhaps this article should have been called 'how to start thinking about conducting a survey'. There is no general procedure that can be followed and which will automatically result in a good survey; all this article sets out to do is to bring the potential problems to the attention of the student. When planning a survey, students should find solutions to these problems which are compatible with the particular subject of the study, perhaps in consultion with their supervisors. To illustrate the main points, I will be describing a hypothetical survey to find which of two different word processors is the easier to use. I will call these imaginary products WordPro and WordPerfect. Of course, these have no relation to the real products with the same names. Such surveys may well be conducted by companies that make word processors, in a an attempt to improve the competitiveness of their products. The most important issue to keep in mind when planning a survey is that you are trying to find something out. If you don't know in advance what the survey's objectives are, then you should question whether you really need the survey. The objectives of a survey can usually be phrased in the form of questions: `Which word processor do most people use?' `How long does it take to learn to use WordPro?' `Why do people prefer one product over the other?' On the whole questions that start with `Why...?' tend to be harder to answer than those that start with `Which...?' or `What...?'. They usually have to be translated into a series of `What?' and `How?' questions to be capable of rigorous interpretation. 2 Questions and questionnaires
The important point here is that you will only get answers to the questions you ask. And even these will be contaminated with sampling error and bias, as we shall see. If the questions you ask do not satisfy the objectives of the survey, then the survey has failed. To ask questions of a large number of people, many experimenters make use of questionnaires. In most cases, only a small number of people surveyed will respond, and the more complex the questionnaire the fewer responses there will be. The design of questionnaires is an issue about which complete books have been written. Here are a few general guidelines.
3 Sampling and the population
Beginners often forget that results of a small survey do not automatically extend to the population. There are two main reasons why this is so. First, there is sampling variation. Let's suppose that in our hypothetical survey 50 users of word processors were questioned; 60% of these people used WordPro and 30% used WordPerfect. The remaining 10% used something else entirely. It is likely that if a different group of 50 people were surveyed there would be different results, for example 55%/30%/15%. Both the survey groups gives results which are only estimates of the `real', population- wide usage of word processors. Because they are only estimates, the two surveys are likely to give different results. This would be true even if the second group was alike in every measurable respect to the first. This is an inescapable principle of sampling: the survey group is only a sample of the general population, and measurements made on it are only estimates of the `true' population values. Second, in practice the second group surveyed will not be identical in all respects to the first. The group may consist of people of different ages, with different proportions of men and women, with different occupations, and so on. Any of these factors could affect the group's usage of word processors. How will this affect the result when the conclsions of the survey are extended to the genaral population? The result will be more accurate if the composition of the sample group is the same in all important respects to the composition of the population. If the sample group is very different to the population, we say it is non-representative. Non-representative sampling is one of the most frequent causes of error in surveys. For example, if I carry out a survey of votes cast at a polling station at eight o'clock in the morning, I will almost certainly get a different result compared with what I would have concluded at, say, ten o'clock in the morning. The reason is that fewer well-paid professionals are out and about at eight o'clock than manual workers; these groups of people will probably have different political viewpoints. If you carry out a survey by selecting people whom it's convenient to question, then you have to accept that the results of the survey will only apply to the population of which this group is representative. For example, in the case of the `word processor' survey, if I only question university students then my results will only be generalizable to other university students. Students that use word processors are not representative of `people that use word processors'. The most obvious difference is that of age, although educational background will be an important difference as well. Often you can't do anything about this; but at the least you should recognize that the problem exists. 4 Certainty and confidence
Because we can't have certainty we have to settle for confidence. The more people we survey, the more confident we become that the results apply to the population. A typical target is that of 95% confidence. Expressed simply, this means that we survey enough people that we can be 95% sure that the outcome applies to the population as well as the survey group. Note that `95% sure' in this case has a precise mathematical meaning. It means that if we repeated the survey many times, 19 times out of 20 (=0.95) we would obtain a result that was compatible with that of the population. If, for example it was true of the whole population that it found WordPro easier to use than WordPerfect, then a survey that had been chosen to give 95% confidence would obtain this correct result 19 times out of 20. You can estimate the confidence level after carrying out the survey, but you must decide in advance what level of confidence will be acceptable. If your survey does not give this level of confidence you can use the results to plan a new, larger survey. If you cannot state what confidence you have in your results (as a number, not a vague hunch) then your survey is worthless. Estimating confidence levels from a given set of data is a a standard statistical procedure, and one that is described in any basic textbook on statistics. Estimating the size of the survey that will be needed to give the required confidence level is much more difficult, and requires consideration of the statistical power of your survey. Statistical power is a measure of how sensitive the survey result is to variations in the population, and is explained in slightly more advanced textbooks. Better still, consult a statistician. In any case, you need some data to estimate statistical power, and in most student projects this data will not be available at the outset. Because it is difficult to estimate in advance how many people you need to survey, in many large projects there will be a pilot study, who purpose is to find out enough about the population to plan the survey properly. In a student project you will probably have to settle for estimating the confidence levels at the end of the survey; if they are low you will need to find out why and suggest how they could be improved. Confidence levels are nearly always improved by increasing the size of the survey, but often a change in the survey design can give an improved confidence with much less expense. 5 Bias
The textbook solution to the problem of bias is that of randomization. This means picking survey subjects from the population group at random. Bear in mind that if you send out questionnaires and you use all the replies, this is not a random sample of anything. This is because people who take the trouble to respond to the questionnaire are probably not respresentative of the group you sent them to. In some cases it is necessary to use `stratified' random sampling to ensure that the sample is typical of the population. For example, if I were surveying users of word processors, and I new in advance that 60% of word processor users are women, I might want to ensure that 60% of people in my sample group were women. However, I would still try to select the individuals themselves at random. 6 Presentation of results
A problem that afflicts many students is that of distinguishing between `objective' and `subjective' reporting. It is quite important that a person reading the outcome of your survey can distinguish easily between factual or numerical results, and the experimenter's interpretation of the results. It is perfectly acceptable to conjecture about the reasons for a particular finding, but it is almost never helpful to mix facts and conjecture in a survey report. Bear in mind that the reader is also capable of interpreting your results, perhaps in a different way to you; to do this it needs to be easy to separate the objective results from your subjective interpretation. The `traditional' model for an experimental report has a section titled `results' and one titled `discussion'. The first of these is for plain, factual results and the second for interpretation and conjecture. This is still a sound way to report on the results of a survey. If you use statistical analysis of your results, you don't need to include calculations, but you do need to include an explanation of the reason for adopting a particular statistical approach. 7 General guidelines
|
|
|||||||||||||||||||||