Copyright © 2014
Data Analysis Australia

What Are You Really Measuring? 

Reliability and Validity in Questionnaire Design

In today's world organisations need strategic goals and targets and clear measurements are needed to assess progress towards these goals.  Some of these targets are easy to define and the measurements are clear cut, particularly certain financial goals, production and quality control targets.  However some of the most vital aspects of a well-functioning organisation are more complex to measure.  For example, the climate and culture of an organisation is known to be central to optimising employee wellbeing, productivity and innovation.  Similarly, it is important to select executives or employees with certain character traits and dynamics for them to function effectively in their roles.  Unlike annual income or production, which can be directly measured, many of the psychological aspects of an organisation are "intangible constructs" and can only be measured indirectly.

The classic example of an intangible construct is Intelligence Quotient (IQ).  Most of us agree that there is such a thing as intelligence - and that some people have more of it than others! But unlike height or weight it can't be measured with a tape-measure or a set of bathroom scales.  

Figuring Out What You Want To Measure

Often the first step in measuring an intangible construct is coming up with an Operational Definition.  This means defining what the construct is, what it's comprised of and what measures it.  This stage tends to include a review of previous research on the topic to identify what is known about the subject and how people have tried to measure it in the past. 

In this type of work, our clients usually have a model of what makes up their construct, or we can help them develop one.  As a fictitious example, they might want to measure Organisational Effectiveness and they hypothesise that it is made up of four organisational traits: Morale, Innovation, Management and Teamwork.  In this case, each of the four traits needs to be measured.  Questionnaires are generally used to collect this type of information.  For example, a good design might be a questionnaire with six questions each about each of the traits.  The responses from the six questions about each trait will later be aggregated to give a measurement of Morale, Innovation, Management and Teamwork.

After defining the construct and its components (traits), and producing questions to measure each of these, a testing stage is strongly recommended.    The aim of testing is to ensure that the questions are measuring what they are intended to: that is that they produce a reliable and valid measurement.

Reliability

Reliability means the consistency or repeatability of the measure.  This is especially important if the measure is to be used on an on-going basis to detect change.  There are several forms of reliability, including:

  • Test-retest reliability - whether repeating the test/questionnaire under the same conditions produces the same results; and
  • Reliability within a scale - that all the questions designed to measure a particular trait are indeed measuring the same trait.

Validity

Validity means that we are measuring what we want to measure.  There are a number of types of validity including:

  • Face Validity - whether at face value, the questions appear to be measuring the construct.  This is largely a "common-sense" assessment, but also relies on knowledge of the way people respond to survey questions and common pitfalls in questionnaire design;
  • Content Validity - whether all important aspects of the construct are covered.  Clear definitions of the construct and its components come in useful here;
  • Criterion Validity/Predictive Validity - whether scores on the questionnaire successfully predict a specific criterion.  For example, does the questionnaire used in selecting executives predict the success of those executives once they have been appointed; and
  • Concurrent Validity - whether results of a new questionnaire are consistent with results of established measures.

Validating a Model 

Going back to our hypothetical example, the client has a model of Organisational Effectiveness that is made up of four organisational traits: Morale, Innovation, Management and Teamwork. They also have a questionnaire with questions that are intended to measure each of these traits.  However, as they are using the questionnaire to infer levels of Morale, Innovation, Management and Teamwork, it is important to assess whether the results are consistent with this model being accurate.  There are a number of statistical methods available to test whether the data collected using the questionnaire supports the model, or whether either the questionnaire or the model needs revision or development.  Principal components analysis and exploratory or confirmatory factor analysis are among the statistical techniques often used to assess a model.

These techniques can often provide a deeper understanding of the issues being surveyed, and can reveal that questions are measuring more - or less - than they were intended to.  For example, many years ago Data Analysis Australia staff were assisting a client with survey data relating to occupational health and safety (OHS) issues.  One of the questions might be paraphrased as "My supervisor puts my health and safety above productivity", which was created to measure OHS issues.  However, analysis revealed responses to this question instead related mainly to the first words - "my supervisor", and showed more about industrial relations than OHS.

Another benefit of using techniques such as factor analysis to assess a questionnaire is improved efficiency.  We are often able to advise clients on ways in which they can reduce the length of their questionnaires while maintaining or increasing the information that can be obtained.  Reducing the number of questions in an overly lengthy questionnaire makes it easier for respondents to complete, and increases response rates. 

Generalisability and Confounding Issues 

In testing the questionnaire, the test sample is also important.  For example IQ tests were used incorrectly in the US many years ago on migrants with limited English - in this case they received poor scores, but the test was inadvertently measuring their ability to read and respond to a test written in English rather than their actual IQ.  There are two important lessons that can be taken from this example.  The first is that other issues that alter our results can pop up in research if we don't give sufficient thought to what we are really measuring.  As in the OHS example earlier, even a question that appears fine on the surface can be confounded by other issues in some cases.  

The second lesson is to be cautious in generalising results to other groups.  If a questionnaire is designed for a specific group it is important to test it on a representative group.  A questionnaire that will be used for assessing Board members should be tested on current/prospective Board members if these are the people that the information is required for.  If the questionnaire is to be used on many different groups of people, it's important to test it on the different groups it will be used for to ensure it is valid in all its intended usages.

Which of These Issues Do I Need to Consider For My Questionnaire?

The type of reliability and validity issues that need to be considered vary from one situation to the next, depending on what the questionnaire is measuring and its intended use.  There are a range of statistical procedures designed to test reliability and validity. In addition specific survey designs may be necessary to ensure that the required information is available to establish some of the more complex types of validity or reliability.  

A number of Data Analysis Australia's clients work in specialist areas in which a small number of rigorously tested survey products form their core business.  For these questionnaires in particular, attending to issues of reliability and validity is important to ensure their products are of a high quality.  Ongoing research and development of the survey products allows clients to maintain an edge in the marketplace.

For simpler surveys where a questionnaire is gathering information that only needs to be used in a practical way rather than inferential way, the reliability and validity requirements are more basic.  However, even in these situations, it is important to make sure consideration is given to whether the survey is measuring what it should be.