Data often comes in droughts and floods - sometimes there is not enough to make good decisions and at other times there is too much data to comprehend. Today, the problem is often the complexity created by the large number of variables observed.
This is seen in many situations. In mining, modern analytical equipment can analyse many elements at once. In social research, a survey can collect dozens of separate items of information for each person interviewed. In genomics, it is possible to measure thousands of genes and hundreds of thousands of nucleotides for each individual. Imaging data from a camera or by remote sensing measures light in several colours at millions of pixels.
Statisticians describe such challenging data as high dimensional. There are simply too many variables, each one possibly relating to the rest, to think about at once. Even the first step of exploring data visually is difficult. If there were just two variables they could be plotted on paper or a computer screen, so that the eye can do what it does best - find patterns. Three dimensions can be imagined since we live in a 3-D world and many computer packages now enable three dimensional graphs to be drawn. But most people, including statisticians, have difficulty thinking and visualising in four dimensions and beyond. So what do you do when you get 10 or 20 variables of interest - or even hundreds or thousands?