Sampling and Weighting – A Better Practice Guide for Practitioners

As part of its commitment to encouraging the use of scientific method, the Australian Market and Social Research Society (AMSRS) recently commissioned Data Analysis Australia to develop the Society’s newest professional development resource, Sampling Design and Weighting for Australian Household/Consumer and Business Surveys - A Better Practice Guide.  The Guide’s aim is to provide all members of the AMSRS with a comprehensive, practical and application focused Better Practice Guide to be used as a resource by market and social research practitioners of all levels in making methodologically and statistically rigorous choices at the research design and analysis phases of projects which require the sampling of Australian households, consumers or businesses.  

Covering a suite of topics including formal sampling design concepts and terminology, sample sizes, methods of selecting a sample and weighting of sample data to reflect the full population, the Guide has a focus on practical implementation.  Its practical nature encourages the researcher to understand the statistical concepts and implications of sampling and weighting, to enable them to apply informed understanding to their specific research projects.  With thanks to the AMSRS, this Analytical Ideas article includes an edited excerpt from the Guide, discussing Representativity of a Sample, and Sampling and Non-Sampling Errors – considerations which are important in the design stage of any survey.

Sampling to reflect a population

The standard model for sampling has a target population about which inferences are to be made.  The population consists of sampling units, the natural units that can be chosen to be part of a sample.  The units might be people, households, businesses or geographical areas.  In practice, perfect sampling is rarely achievable.  

Frequently it is not possible to perfectly target the desired population – some may simply be inaccessible, or too expensive to make it practical to sample them.  In such a case, it is common to talk of the sample population.  Strictly speaking, inferences can then be made only about the sample population.

A sample is simply a chosen subset of the population, from which inferences will be drawn about the population.  How that subset is chosen relates to the principles of sample design and selection. 

These relationships are illustrated schematically in the accompanying figure to the right.

The researcher must typically use other information to consider whether inferences about the sample population are relevant to the target population.  This is closely linked to the more general issue of representativity.

Representativity of a sample

Since the purpose of a sample is to make inferences about the population, the sample must in some sense be representative of the population.  That simply means that the sample must not be too special – the properties of the sample must at least approximate those of the population.  It does not have to exactly match the properties of the population – weighting of the sample can assist with this – but it is important to consider just what your sample is representative of, and also what it is not.

Advanced point: The type of representativity needed will depend upon the purpose of the survey and the analysis. Where the analysis focuses purely on relationships between variables of interest measured on each unit, the sample need only ensure that these relationships are the same as in the population, even if the sampling units themselves are not representative of the population. For regression analysis, it will often be optimal to have a sample that reflects the extremes of the population provided that you are confident in the regression model.

 

Example: If one aim is to understand the relationship between fitness activities and obesity, the statistical analysis method might involve regressing a measure of obesity (such as the Body Mass Index) on measures of activity, both of which are collected in the one survey. Assuming that the regression model is appropriate for the whole population, the data itself need not be representative. On the contrary, it would be ideal if the data includes significant numbers of people who do engage in each activity and significant numbers who do not.

The design of experiments is a whole area of statistics that deals with optimising such samples. This may mean the ideal sample is highly non-representative of the population. However surveys are rarely aimed at answering just one question and representativity is very likely be required for other purposes.

Sampling and non-sampling errors

A sample is a subset of the population (usually selected with some degree of randomness, based on methods discussed below) and if a different subset was chosen it would give slightly different inferences about the population.  Neither of these subsets would be perfect, and the imperfections are generally termed errors, a term that does not imply mistakes but rather the difference between a sample and the population.

Note: If it is possible to at least think of the sample in terms of the probability of each unit being included in the sample, it is often called a probability sample. In most cases a sample is drawn using a random process which makes the concept of probabilities particularly obvious. A sample where it is not possible to think of the process in terms of probabilities is called a non-probability sample.

The errors can be divided into two types:

  • The sampling error represents the errors that will change from one randomly chosen sample to the next.  It is the discrepancy which results from making inferences from sampling a subset of the population, rather than sampling all the population.
  • The non-sampling errors are all other sources of error that will be consistently present in each randomly chosen sample.  These errors are also called biases.  The difference between the target and sample population is often considered to give a non-sampling error.  Other biases include differing response rates, leading questions and poor coding frames.

In general, sampling errors can be reduced by taking a larger sample while non-sampling errors will remain the same regardless of sample size.  However, non-sampling errors can be reduced by good questionnaire design, planning, execution and quality checks.  Hence it is important to put sufficient effort into planning these aspects of the survey.  

This guide emphasises the sampling errors, but the researcher needs to be vigilant for all types of error.

The terminology in this area is often confusing, particularly with regard to the terms “accuracy” and “precision”.  Many authors use the word accuracy to mean lack of bias and the word precision meaning highly repeatable.  By highly repeatable, we mean if the survey was run again using the same methodology, but with a different random sample of participants, the results would be similar. 

The following classic picture illustrates the possible combinations of accuracy and precision.  In this context, sampling errors contribute to a lack of precision (Targets 2 and 4) and non-sampling errors to inaccuracy (Targets 3 and 4).  The aim is to be in Target 1, although often in practice, trade-offs need to be made.

Example: A poorly worded question or poor sequencing of questions may bias responses in a survey. This is a non-sampling error and must be remedied, if possible, by improving the questionnaire. Improving the sample design, or sampling more units, will not help.

Checklist – non-sampling error

  • Is the sample population close enough to the target population?
  • Are the questions actually measuring what is required in the research?
  • Does the wording and the sequencing of the questions avoid leading the respondent?
  • Is the question wording clear and unambiguous?
  • Are you able to sample in a representative manner across the sample population?
  • Are interviewers given no discretion as to who to sample?
  • Can the questions be answered honestly and accurately?

Acknowledgements

The above article is a slightly edited excerpt from Sampling Design and Weighting for Australian Household/Consumer and Business Surveys - A Better Practice Guide. An Australian Market and Social Research Society initiative and resource, prepared by Data Analysis Australia.  The full Guide is available to members of the Australian Market and Social Research Society via the AMSRS website.

December 2012