Data Analysis Australia
STRATEGIC INFORMATION CONSULTANTS
Copyright © 2012
Data Analysis Australia
What Size Sample Do I Need?
At Data Analysis Australia, one of the most common queries we receive from current and prospective clients is "what size sample do I need?". A common misconception is that 400 is the magic number. However, it is not always this easy - one size does not fit all applications. In fact 400 is rarely the right answer. Not surprisingly the answer depends upon the details of the question and understanding the question is the best starting point.
Survey results are often used to find an answer to a question or to help make an informed decision. Sometimes this is expressed in terms of estimating a number such as the proportion of shoppers who might buy a product, the proportion of customers who are satisfied with a service, the average turnover of companies or the gold grade in a deposit. Major decisions can be based on such survey estimates and clearly reliable decisions need reliable estimates.
However any results that come from a survey will be subject to some degree of error. This error can be separated into two types:
There is a well developed statistical theory that helps us understand this type of error. The theory is used when setting the sample size and choosing the sampling design. The error can often be readily quantified (often before the survey) which helps when choosing the most appropriate sample size and sampling design.
Usually this type of error can't be quantified, but steps can be taken to minimise its effects. Having a good and clear questionnaire is the first step. It is good practice to have questionnaires tested before the survey begins, so that these sources of error can be identified and fixed.
It is important to consider both types of error when designing a survey. Any benefit achieved from reducing the size of one type of error can very easily be wasted if the other type of error is larger.
Before proceeding too far, we need to define a few terms.
This is the entire group about which answers are to be obtained. It is important to realise that populations are not restricted to people. For example, populations can also refer to businesses, clubs, households, mine deposits or whatever else is of interest to the survey. The population of interest needs to be clearly defined before the survey begins and sometimes it is quite difficult to define accurately. Examples of populations are:
The subset of units in the population who are actually surveyed. This is often but not always a random selection. To demonstrate, the figure to the right shows a population of 9 units (people). The sample consists of 3 units (people) as shown by the red circles.
A measure of the error that results from surveying a subset of the population rather than the entire population.
Questions to be Asked
A number of questions need to be asked (and answered) before a suitable sample size can be determined. These include:
So, determining the "correct sample size" is not a simple task. In fact, a large part of determining the sample size is not simply "how many should we sample", but how cleverly the sample is chosen. A "smarter" sample design can give more accurate estimates with a smaller sample size.
In general, the more complex a survey that is being conducted the more effective a smart design can be. It is often more cost effective to spend additional resources in designing the sampling methodology than simply sampling more units. Techniques such as systematic sampling, stratified sampling, cluster sampling, multi-stage or multi-phase sampling can all be used to improve the sample. Some of these are described below, and in practice a sampling design is likely to have elements of many types of sampling techniques.
Stratified sampling is one of the most common types of survey design. This involves separating the population into distinct groups and then choosing a sample size for each group (for example, males/females, states of Australia or divisions of a company).
There are two main benefits of a stratified sample:
1. Stratified sampling ensures that an adequate number of respondents are gained for each subgroup of interest. This also helps to ensure that a representative sample is achieved.
2. For the same size sample, a superior estimate at the overall level and also at the subgroup level can be obtained by allocating a higher proportion of the sample to the groups with higher variability. To maximise the benefit achieved from using a stratified sample, the distinct groups should be chosen so that units within the same group are as similar to each other as possible.
A good example is for surveys relating to business activity. In many cases, a few very large companies have a big effect on the overall value. However, it is also important to get a good estimate of the combined value of the smaller companies. In such cases, groups (strata) can be formed according to the company size with only a percentage of the smaller companies being surveyed, and all of the larger companies being surveyed. All companies with 0 to 9 employees might be grouped, and 10% of them surveyed, 15% of companies with 10 to 20 employees could be surveyed and all companies with more than 20 employees could be surveyed. Although this means that the responses need to be weighted using statistical techniques to provide meaningful estimates at the overall level, this method provides superior estimates. This sampling design also helps to reduce burden for small companies that often have the greatest difficulty in responding.
Another variant is a multistage or cluster sample which can used when there is a natural physical grouping in the population that can be exploited to reduce the effort, and hence the cost, of surveying the respondents. In its simplest form, once the groups or clusters have been identified, a number of those groups are chosen at random and then all individuals within the chosen groups are selected. Sometimes only a subset of individuals within the chosen groups are sampled, rather than all of them. This is often called cluster sampling, and may well go beyond just two stages as described here.
An example is a face to face household survey where firstly suburbs are randomly selected and then houses within the chosen suburbs. A saving comes from an interviewer being able to visit all the selected households in a suburb in a few trips, minimising travel time. From a strictly statistical sense cluster sampling is usually less efficient than would be the case for a Simple Random Sample, in that for a given total sample size less information is obtained about the population. However the reduced travel costs involved in sampling from fewer distinct locations may permit an increase in sample size that more than compensates for this. For this reason, cluster sampling is typically preferred when there is a need for the personal contact with survey participants or to ease administrative burden associated with sampling distinct groups.
Another example is a survey of teachers, whereby firstly schools may be randomly chosen and then teachers within those schools. A saving comes from being able to deliver a batch of questionnaires to a single school and being able to identify a point of contact within the school to assist in the administration and promotion of the survey to increase response rates, rather than needing to individually contact randomly selected teachers at different schools. However, perhaps more importantly in this context, it also has the advantage of ensuring sufficient sample in each selected school to analyse similarities and differences between teachers' responses within the same school and in different schools. This can enable, for example, the effects of the department, the effects of the school and the effects of the individual teachers to be separated out and analysed. As such, cluster sampling should not only be considered as a "necessary evil" in terms of being a cost saving measure associated with personal contact, but also when a statistically efficient way of analysing responses both within and between groups is required.
The above discussion has focussed on sampling to obtain valid estimates for the entire population based on results obtained from surveying a subset of the population only. But consider the special case where the primary question is not to determine the overall population value, but is simply a question of quality, to determine whether or not a batch or group of items is of sufficient quality to accept the whole batch or reject the whole batch. The question then becomes "how many items do I need to sample, and how many can fail, before I deem a batch to be of insufficient quality to return it?"
With a history in military applications, whereby shells had to be highly consistent in trench warfare, and further development through manufacturing, a suite of International Acceptance Testing Standards have been developed to cover just such applications. These Standards cover a range of quality levels, termed "Acceptable Quality Levels" (or AQLs), and cover a range of batch sizes. At the simplest level, what acceptance sampling does is give the user the sample size of how many items they need to test for a given batch size and AQL and also, how many are allowed to fail the test before rejecting the whole batch. For example, with an AQL of 2.5% and a batch size of 1000, 125 items need to be tested and if 7 or fewer fail, the batch passes, otherwise the batch fails. While an AQL of 1%, say, requires higher quality (fewer failures) than an AQL of 2.5%, it should not be confused with allowing a 1% failure rate, compared to a 2.5% failure rate.
Acceptance sampling is very different to the other types of sampling discussed in this article. Its purpose is not to estimate a population value with desired accuracy, but to provide an unambiguous set of decision rules as to when a batch should be considered of sufficient quality to accept it. It is frequently used, for example, when determining contractual obligations between consumer and supplier, whereby the appropriate balance of protection can be given to both parties - if the batch passes based on the results of the acceptance testing, the consumer must accept the batch and pay for it, whereas if the batch fails, the supplier must fix the problem or replace it. Therefore the Standard is designed to operate within the context of continual improvement, with the Standard being as much about providing feedback and incentives to the supplier as it is about acting as a gateway to hold back defective items, with rules to implement higher rates of sampling and checking for consistently poor performance and reduced rates of sampling for consistently good performance.
As such, the Standard needs to be defined in a way that is useable "on the floor" without the need for complex calculations for each case. The way that this is implemented is by deriving a series of tables, covering different AQLs, different batch sizes and various other assumptions, with each table providing the relevant sample size for that batch and how many items can fail before the batch is deemed rejected. While these tables are based on statistical principles, their need for simplicity and suitability to be included in tables means that by necessity, they are not as statistically rigorous and specific as the other means of sampling described elsewhere in this article. While as a general rule larger sample sizes are required for larger batches, batch sizes are "grouped" into tables in the Standard, with the same sample size being used for a large number of different batch sizes. For example, whether there are 10,001 or 35,000 items in a batch (or any number in-between), the same size sample is used for a given AQL, but this sample size is greater than if there were, say, between 91 and 150 items in the batch. This is very much based on practicality and decision rules, rather than pure statistical properties.
Even though the Standard in some sense "gives" the required sample sizes, there are still decisions to be made to select the appropriate tables of the Standard to follow. The results and applicability of using Acceptance Sampling techniques and sample sizes must be carefully considered on a case by case basis rather than applying the Standard without proper thought.
What if the "right sample size" is not affordable?
Sometimes, the ideal sample size and design just doesn't fit into the budgeted time and/or money constraints. In these cases, a trade-off decision typically needs to be made between the competing priorities of the survey. Some options include:
So, when is a sample size of 400 appropriate?
So, going back to the magical number of 400 referenced in the opening paragraph, is it ever the right sample size to use? Yes, it can be appropriate, but only with the right assumptions and accuracy requirements. In fact the usual argument for it is based upon:
When are all these assumptions met? Probably rarely. This means that many surveys are carried out with sample sizes that are either unnecessarily large, leading to unnecessary cost, or giving insufficient accuracy to make proper decisions.
Data Analysis Australia has both the statistical expertise on these issues and the practical experience in conducting high quality surveys. Our consultants understand the process from the first steps of formulating the questions that the survey must answer through to the analysis of results, providing professional judgement on what is best for each situation.