This website has a lengthy history of football analysis. However, we have lagged behind in the most important metric of football analysis, one has even begun to be included on Match of the Day (which is hardly at the forefront of analysis). This metric is 'expected goals', also known as xGoals or xG.
For a brief description of how this metric works, you can click the header below to bring up an explainer. For those of you who are already familiar with expected goals, you'd be better off skipping it as you will have probably read dozens of explanations elsewhere.
However, not all shots are equal. For example, a tap-in to an unguarded goal from 6 yards is more valuable for a team than a 30 yard speculative effort. xG is simply a way of measuring the quality of a chance. Specifically, it says how likely each chance is to result in a goal.
So, for the two examples above, the 6 yard chance may be scored 95% of the time, whilst the shot from 30 yards may be scored 2% of the time. So the xG for the two shots are 0.95 and 0.02 respectively. Let's say both shots were by the same side - this means we can say that in total we'd expect them to score 0.97 goals (0.95 + 0.02).
In order to know the xG of a chance, past data is used to give an idea of what we would expect. A simple example is for a penalty. We may look back and find that 75% of penalties are scored, which means that the xG of any penalty would be 0.75.
For non-penalties it is more complicated as we need to find ways to quantifying how good each chance is. It depends on the model used as to what metrics are included, and the more sophisticated the model, the more factors they will include. Generally, things such as distance from goal, the number of players in the way of the shot and how the player hit the ball (e.g. header, dominant foot, weaker foot) are the kind of things that may be used.
However, we have been lucky enough to be given access to some very valuable data by StrataData, who look at various leagues across the world. Crucially, this includes data on the Scottish Premier League (SPL), which isn't covered by the other main providers of detailed football data. This is particularly interesting to me as I live in Scotland. Therefore, we will be using this data to analyse the SPL.
We will be looking at teams and individuals in future articles, but for now we will just go over how our xG model works.
Our model
To add depth to our model, we have data from four additional leagues to the SPL, all of approximately similar quality.
As mentioned in the xG explainer above, models can be continually refined by taking more and more factors into account, trying to use various metrics to estimate how good a chance is. However, the beauty of the StrataData data is that each shot is assessed and placed into a category describing how good the chance is, and how likely it is that it will result in a goal. The categories are various ratings ranging from Poor to Superb.
We then calculate how many times across our data chances of each category are converted, in order to come up with an xG value for each category.
With estimations of how likely each chance is going to be scored, we can then include a different measure for each match, and one which is potentially more interesting, that is the likelihood of victory.
Likelihood of victory
Because we know the likelihood of each chance resulting in a goal, we can also work out the likelihood of each team winning a given match, given the chances they had.
To take an incredibly simple example, say Team A & Team B had a match, where there was only one chance, and it fell to Team A. If the xG of that chance is 0.25, then there is a 25% chance of Team A winning, and a 75% chance of the match ending in a draw.
For a slightly more complicated example, say Team A had two chances, which had xG of 0.05 and 0.4, whilst Team B had one chance with an xG of 0.65. We can work out the probabilities by taking each chance sequentially (the order doesn't matter).
After chance 1 (Team A; xG of 0.05), the score probabilities are:
- 0-0: 95%
- 1-0: 5%
After chance 2 (Team A; xG of 0.4), the score probabilities are:
- 0-0: 95% x 60% = 57% [the probability that the score was 0-0 and this chance was missed]
- 1-0: (5% x 60%) + (95% x 40%) = 41% [the probability that the score was previously 1-0 and this chance was missed, added to the probability that the score was 0-0 and this chance was scored]
- 2-0: 5% x 40% = 2% [the probability the score was 1-0 and this chance was scored]
And finally, after chance 3 (Team B; xG of 0.65) the score probabilities are:
- 0-0: 57% x 35% = 20% [the probability the score was 0-0 and this chance was missed]
- 1-0: 41% x 35% = 14% [the probability Team A led 1-0 and this chance was missed]
- 2-0: 2% x 35% = 1% [the probability Team A led 2-0 and this chance was missed]
- 0-1: 57% x 65% = 37% [the probability that the score was 0-0 and this chance was scored]
- 1-1: 41% x 65% = 27% [the probability that Team A led 1-0 and this chance was scored]
- 2-1: 2% x 65% = 1% [the probability that Team A led 2-0 and this chance was scored]
To see each outcome's likelihood of happening, we add up the probabilities of each scenario that matches this outcome:
- Team A win: 14% + 1% + 1% = 16% [the probabilities of Team A winning 1-0, 2-0 and 2-1]
- Draw: 20% + 27% = 47% [the probabilities of a 0-0 and a 1-1 draw]
- Team B win: 37% [the probability of a 1-0 Team B win]
So from this, we can see that Team B are more likely to win than Team A, but that a draw is still more likely. Of course, a real match will have a couple of dozen chances, but the principle remains the same.
After a match fans and pundits often argue that the result "should" have been something different to the final score. This measure allows us to actually give a quantitative measure of this, and when taken over several games will be very useful in seeing whether a team deserves to have the points tally it has, or whether it has been lucky/unlucky.
Summary
Expected goals have been covered in huge detail elsewhere, and our likelihood of victory measure is nothing new, either. However, by combining this with our exciting new data, we should be able to use it to provide new insights, particularly into the realm of Scottish football.
Next time we are going to look at the twelve SPL sides and see how they have been performing thus far this season.