The Importance of Good Data

While data is often considered the starting point for any statistical analysis, Data Analysis Australia believes that the starting point should be planning the data collection itself.  Effort at this stage is always rewarded - lower costs, more powerful analysis and the assurance that the project is likely to succeed.  Conversely, mistakes at this stage can be difficult, expensive or even impossible to recover from.  Proper planning ensures that the right data is collected, that it is complete and that it is sufficient for a meaningful analysis. 

Sometimes data collection is a large, complex task.  One example is running tests on a mineral processing plant, where each data point requires days of effort.  Another might be a large scale survey, which requires a major investment in fieldwork.  At the other end of the spectrum, the process of data collection can be seemingly simple.  But even these simple cases often have subtleties.  For example, extraction of data from an existing database will have questions of data definition, filtering and exactly what tables may need to be merged.  In all cases, carefully considered planning will improve the final results.  

Sometimes the result is that no data collection is required - either because the information is already available (sufficient data already exists in, say, an administrative database) or because no data collection is possible that will meet the project aims, within time and budget constraints. 

Statisticians have special tools to assist in planning for such data collection and can provide advice on the amount of data to be collected, the type of data to be collected, the method to use in collecting the data and the storage of the data.  Perhaps the best known tool is sampling theory, which incorporates methods of efficiently collecting this data to ensure that it is representative, together with methods of determining sample sizes required for accurate answers. As well as these technical aspects, statisticians are experienced in many of the day-to-day issues of data collection, including survey design and implementation.  Designing a questionnaire is often seen as a simple task, but in reality it is often full of pitfalls.  Here a statistician's focus on the final analysis is invaluable.

A less well known but just as important set of methods goes under the name of the "Design of Experiments".  Despite the name, this methodology has wide application beyond the scientific laboratory.  Through ensuring that the data design is appropriately balanced, experimental design allows many factors to be considered at once, maximising the amount of useful information that can be obtained from a set of data and can bring about an order of magnitude increase in the efficiency of research.  

To demonstrate, an experiment may involve an entire mineral processing plant, with several operational parameters being changed at each stage.  Using the one set of data to understand the effects of many parameters can give substantial savings.  Careful design ensures that the effects can each be measured while the overall experiment size is kept as small as practicable.  The possible rewards of such an experiment can be substantial, underpinning decisions which can reduce production costs or improve production output.  

Given the potential pitfalls of collecting data, it is not surprising that statisticians are often said to have a stock comment to clients, "If only you had come to me before you began".  There is a strong truth in this. With the involvement of statisticians at the early stages, reducing data collection costs via clever design can be achieved and, perhaps more importantly still, provide an assurance that the investment in data collection will be worthwhile. 

For further information, please contact Data Analysis Australia at or phone 08 9468 2533

December 2011