Premier League with optimal equations

In this article, I’ will show how we can predict football outcomes by using optimal equations. We can apply this method to any football division, but we will use the England Premier League as an example.

The method includes three steps:

1.       We use the data from past years to build a new dataset containing variables with proved predicting ability. Then, we cut each variable (for now on, I’ll refer to them as “predictors”) into small groups and get the percentages for various outcomes within each group for the specific division (e.g., Premier League). By doing this, we can assess the predicting ability of each predictor.

2.       We fit a third-degree polynomial to our data and extract the coefficients, the correlation factor, the standard error, and the R-Squared values. Then we find the optimal equation for each outcome.

3.       Having the best equation for an outcome and the predictors’ values for the new games, we can estimate fairly accurately, the probability of this outcome to occur.

I will skip the first step, messy and time-consuming, by providing the tables with the X-value and the Y-value for every predictor. I used data from 2006-2007 to 2019-2020, so our data sample is big enough (4.463 games).

Suppose we define as outcomes the expected goals number for the home and guest team. In that case, we can calculate the emergence probability of any possible combinations of outcomes on a football game. Then we can compare our fair odds with bookmakers’ odds and, if we see a positive expected value, place a bet.

You will find the entire article here in Kaggle because it contains some coding in R-Language and it is impossible to run here.