Sport and Statistics: A ‘match’ made in heaven

Introduction

MathSport International comprises of a group of professionals who share a passion for both mathematics and sport.  This unlikely pairing has expanded the role of mathematics and statistics in the sporting scene significantly in the past few decades, where data collection and analysis on players to maximise performance is becoming the norm. 

Data collection over the course of a match has expanded vastly, with all Australian Football League (AFL) players now wearing tracking devices that record where the player is on the field at all times, and ball tracking in tennis allowing for every point in the match to be investigated in greater detail than ever before.  With increases in the speed of computers in recent years, real time analysis can now be performed during a sporting match, allowing for coaches to make decisions based on statistical analyses while the match is still in progress.  This trend seems set to continue into the future.

Every two years, MathSport International holds a conference where speakers from all over the world gather to present their work on topics involving mathematics and statistics in sporting applications.  In June 2017, the 6th International Conference took place in the historic Botanical Gardens of Padua, Italy.  Over 100 people, myself included, were in attendance, with upwards of 60 talks held over the three-day conference in a combination of plenary and parallel sessions.

Every two years, MathSport International holds a conference where speakers from all over the world gather to present their work on topics involving mathematics and statistics in sporting applications.  In June 2017, the 6th International Conference took place in the historic Botanical Gardens of Padua, Italy.  Over 100 people, myself included, were in attendance, with upwards of 60 talks held over the three-day conference in a combination of plenary and parallel sessions.

Pieces of work from a wide variety of sports were presented. These included those expected to be covered at a sporting conference, listed with some specific topics from the conference:

  • Cricket – batting order optimisation;
  • Soccer – home team advantage analysis;
  • Tennis – predicting match lengths;
  • AFL – determining the effect of on-field leadership on team performance;
  • Basketball – comparing ranking systems;
  • Baseball – creating season fixtures;
  • Golf – investigating how parenthood affects performance;
  • Rugby Union – proposed alternate scoring system for more competitive matches;
  • Ice Hockey – avoiding fixture clashes.

In addition, some sports I had not anticipated coming across at the conference were presented, with topics including:

  • Formula One – lap time modelling;
  • Volleyball – analysing serves based on flight trajectory;
  • Swimming – aerodynamic analysis;
  • Beach Volleyball – team strategy optimisation;
  • Pole Vault – predicting teenage athlete development;
  • Skiing – ranking quality of ski courses;
  • Air Pistol Shooting – determining the importance of training vs experience;
  • Darts – strategies of where to aim to maximise points.

Some pieces of work relating to generic fitness and competition were also presented.

Topics of Interest

Although the sports covered were distinctly different, many of the project aims fell into one of two categories:

  • Predicting match outcomes
  • Scheduling of fixtures.

Modelling and Simulation

Predicting match outcomes consists primarily of two stages;

1.      Creating a statistical model using existing match data to estimate the outcome; and

2.      Predicting outcomes for different scenarios (i.e. different matches) many times to determine the level of confidence in different outcomes, a process known as simulation. 

The ability to accurately predict match results using simulations is beneficial in numerous ways: 

  • For devising strategies, simulations are helpful in determining the effectiveness of the proposed strategy without need for the strategy to be implemented.
  • For comparing team performance based on the inclusion or exclusion of a particular player, simulations allow for the estimation of results for past matches with the player in question competing or not competing.
  • Predicting match outcomes is the end goal for betting agencies or those who wish to make money from gambling.

Model complexity can vary, with many models from simple linear regression to bivariate Poisson models and Hurdle models making appearances in the conference presentations.  The model chosen is generally based upon the appropriateness of the model to the sport, the data available, and the knowledge of the researcher.  Sufficient data can be difficult or costly to obtain and can be a stumbling block for researchers, however is of great importance in the modelling process.

Once a model has been created, it is used to predict match outcomes under various scenarios. Different scenarios are derived through combinations of the factors used to model the match outcomes, referred to as the explanatory variables. Examples of possible explanatory variables for modelling match outcomes are team strength, location of match (home or away), presence of certain players, weather, and so on. Generally the match outcomes are simulated many thousands of times for randomised combinations of some or all of the explanatory variables, with the results recorded each iteration.  This allows for probabilities of particular outcomes to be estimated based on the number of times a particular outcome is observed in the simulation process.

A further step can be taken through simulating entire tournaments or seasons where every match in the fixture is simulated, allowing for a prediction of end of season results.  The tournament can then be simulated again and again, allowing for probabilities for teams finishing in given positions on a ladder or winning or losing the overall competition.

Scheduling

The other major topic covered over the course of the conference focussed on the challenge of scheduling tournaments and competitions.  It can be very difficult to organise a sporting fixture that gives a fair and balanced competition due to the size of the problem.  For instance, there are 2,430 matches played between 30 teams in the 7 month regular season of Major League Baseball in the US, which makes for a very congested schedule indeed.

Solving the scheduling problem involves the devising of a feasible schedule subject to constraints relating to the competition rules and teams’ preferences.  Should multiple feasible schedules exist, optimisation techniques can be applied in order obtain a schedule that achieves a desired goal, known as the objective function. The objective function will depend on the goal or goals of the competition, be that minimising the variation in distance travelled by teams to create a fair competition or maximising revenue.

All of this work falls into the category of operations research and the most common method used to find schedules is through integer programming.  This approach allows for the mathematical representation of the problem to be formulated as multiple equations, with one of them being the objective function to be maximised or minimised, and the others acting as the constraints that must be adhered to. 

Some constraints are intuitive, such as the fact that a team cannot compete in two matches simultaneously, two matches cannot take place in the same stadium at the same time, or creating limitations on the number of times one team can play another in the league.  Other constraints are based on league or team preferences, such as a team wanting to play at home around certain public holidays, or television networks specifying that two specific teams must play each other on a given evening.  Some innocent sounding constraints can complicate things tenfold for the schedulers or even result in a problem with no feasible solutions.

Once all of the equations are constructed, programs specialising in the solving of constraint problems such as GECODE or LINDO can be used to find feasible or optimal solutions. This generally takes the form of a matrix with a value 1 where two teams are playing at a given location during a given round of competition, or a 0 where they are not.  This process, with variations based on the complexity of the tournament, is used on a wide variety of major sporting competitions.

Conclusion

Every presentation was interesting in that it had a unique approach to solving its given objective.  As a sports fanatic myself who works as a statistician, I found the conference thoroughly enjoyable.  It was also great to see some similarities between the methods used on the research presented at the conference and my own work at Data Analysis Australia.

There are two regional branches of MathSport, ANZIAM (Australian and New Zealand Industrial and Applied Mathematics) MathSport focussing on sporting research in Australia and New Zealand, and the newly created MathSport Asia.  ANZIAM MathSport holds a conference in the years where MathSport International does not, with the next planned for April 2018 on the Sunshine Coast, Queensland.  If you are interested in discovering more about the applications of mathematics, statistics, and computing in sport, then this conference is recommended for you.

For my personal reflections on the conference, see www.daa.com.au/articles/newsletter-articles/conferring-statisticians.

Graeme Ward, Graduate Statistician, Data Analysis Australia

December 2017