Today in many situations the problem is not a lack of data but rather an apparent excess of data. Even in this situation, the question still remains "what does it all mean"? Interpreting large volumes of data and finding the information gems is the challenge. At Data Analysis Australia we have the IT and statistical capability to store and process such data sets, to provide useful and valuable information to the client.
Data Analysis Australia approaches this from a statistical viewpoint, but one that is informed by modern computer science so that the two are incorporated to fully disseminate the data, providing the client with real insight into "what the data means". This gives several advantages over approaches that are essentially computer technology driven:
- Statistics provide the most effective methods of understanding relationships in data. While these are often standard to statisticians, their application to large datasets requires special algorithms and an understanding of how the data is stored.
- Statistics provide measures of significance for what is found. This is critical since data mining methods may throw up many chance patterns and it is important to discard those that are not real.
- Statistics provide a methodology for exploration using subsets of the data, often saving enormous amounts of computer time that might be prohibitive on many operational systems.
Systems at Data Analysis Australia are optimised for handling large datasets - we have capacity to handle databases measured in hundreds of gigabytes and our high bandwidth network means that computation is almost unlimited. A range of software tools is available so that the best can be chosen for each task or step in an analysis.
Data Analysis Australia is frequently consulted by organisations that need to make the most effective use of their data. Since our focus is on information content and what information is required to drive decisions, our expertise complements that of more traditional IT companies.
Examples of project experience where Data Analysis Australia has applied information management and data mining techniques are listed below.
Data Management and Modelling of Demand Profiles for the Energy Industry
Utilities have very large data sets not only in the number of customers but in that data is collected in periods of parts of hours, days and weeks. Company objectives and decision making rely heavily on usage and customer types. Data Analysis Australia has extensive experience in modelling demand profiles for these large data sets in the energy industry. Major costs for an energy supplier relate to the time of day when customers demand power. A smooth load can be met much more cost effectively than an irregular load. Data Analysis Australia has used large databases of billing data to provide understandings of customer types and the demand associated with each type. Ultimately this provides a more objective setting of tariffs and a lowering of risk.
Data Mining of Inventory Data for the Department of Defence
For the Department of Defence, Data Analysis Australia has provided an understanding of many issues related to inventory management. The size of the datasets is challenging - Defence holds over 500,000 active lines in its inventory and these are in many warehouses spread across the nation. Whilst the primary aim of this project was to better quantify the accuracy of inventory records and hence the value, Data Analysis Australia's data mining uncovered a number of procedural problems that could be corrected and therefore provided Defence with better and more accurate information management.
Audit of Data and Information in the Integrated Public Number Database
To determine that the address data and other information contained in the Integrated Public Number Database is accurate enough for its many uses, audits of the data are undertaken by Data Analysis Australia on an ongoing basis. These audits are conducted in conjunction with Gibson Quai, a telecommunications consultancy, on behalf of the Australian Communications and Media Authority. Data Analysis Australia is responsible for interrogating a snapshot of the entire IPND for the audits, as well as the Geocoded National Address File (G-NAF), the reference database used to test the accuracy of address information. This has required the enhancement of the geocoding algorithm developed in-house to efficiently search for and score matches of addresses between the IPND and G-NAF.