Technology has provided us with the means of collecting and storing vastly more data than in the past, creating the era of “big data”. Much of this data is also more complex than previously, typically less well structured, with more variables being recorded and greater issues of data quality. Understanding what the data tells us remains a challenge as the data is harder to visualise and computing power has not kept up with data storage power.
One aspect of big data is the growth in the number of variables. In genetics, every point on the genome can be recorded for each person in a study. In retail analysis, shoppers can choose any assortment of the thousands of lines a supermarket might carry. In electricity supply, it is now possible to record a household’s demand every few minutes.
These examples of data growing horizontally (more variables), not just vertically (more individuals) are often not well handled by traditional statistics and the last twenty years have seen growth in new methods. It has been an exciting time for statistics with advances in mathematical theory being accompanied by new algorithms and new practical approaches.