Football has been slow to embrace advanced statistics but one stat which is gaining popularity with analysts and some pundits is expected goals(Also written as xG). However, not everyone is a fan of it with Jeff Stelling being very vocal about it, calling it the “most useless stat in history of football”.
What is Expected Goals?
Expected goals is a statistic which attempts to quantify how many goals a team or player should score by taking in a large number of factors for every shot (shot location, number of defenders in the proximity, whether it’s a header, volley or on the ground etc) and uses historical data to give a number between 0 and 1 on how likely the shot is to be scored. So a shot with an expected goal value of 0.9 would be expected to be scored 9 times out of 10 by the average player. All numbers referenced below come from http://www.understat.com.
A Good Stat Badly Named
The additional information and context being applied to each shot makes this a richer statistic when analysing performance than shots or shots on target. However, there is a massive drawback to the stat and that is that it bases these numbers on the shot being taken by an average player. The player taking the shot is a massive factor in whether it would be expected to be a goal or not. If you give Eden Hazard and Alvaro Morata 10 chances to score a header from 8 yards with a defender challenging, who would you think would score more often? Yet, expected goals would tell you that the number of times a goal would be scored from that shot is the same for both players. This is where I believe the name of the stat and what it shows are incongruent. It does not show the expected goal figure, it shows the quality of the chance. No matter the player, the chance is still the same chance, but it’s more likely to end in a goal if the player is more adept at scoring from that type of chance.
Worse than the sum of its parts
Another flaw is that it is an aggregation. 10 shots with a 0.1 xG each are added together to give a team an xG of 1 for a game but if the opposition created one really high quality chance (say, 0.9 xG) then watching the game, we would all be saying that the opposition had the best chance of the game. xG, however, would tell you that the team with the 10 low value shots “should” have won the game. This goes against the fundamental principle of the stat, which is that it should show the quality of the shot and not just which team takes more shots.
This is shown up if we look at Burnley. The expected goals would suggest that Burnley are getting very lucky defensively and that their goalkeeper is an unbelievable shot stopper since they have currently conceded 11 goals less than their xG against would suggest. However, when we dig a little deeper, 4.5 of those 11 expected goals are for shots outside the box, which in general, are less likely to be scored. Sean Dyche has built the Burnley defensive system to force shots from range rather than from in the box. By playing deep and on the counter attack, Burnley set up knowing that they will give up a high number of shots but they are ok with that as long as they are not high quality chances. xG misrepresents this due to the aggregation, Burnley have had luck on their side and their keepers (now Nick Pope after Tom Heaton was injured in October) is having a cracking season but not 11 goals worth of luck and goalkeeping heroics.
An Opportunity Missed?
The principles behind xG are sound. Not all shots are created equal and so should not be counted the same when we are analysing team’s performance. However, there is the potential for this information to be represented in a more useful manner.
Firstly, change the name. As explained above, expected goals is a misnomer. Chance Quality or something similar would be a better representation of what the data actually shows. The name of anything is vital because it creates expectation. In calling the stat expected goals, the assumption is that this stat will give an entirely accurate representation of how many goals a team should score. The limitations above though show this not to be the case.
Now that we have changed the contextual lens through which this information is being viewed, don’t aggregate the information, instead show it in a different way. My suggestion, show the number of high quality chances that a team creates per game. High quality chance could be anything with an xG of 0.6 or higher, so it’s a chance that is expected to be scored by the average player more than half the time. This would give a better representation of which teams are creating high quality chances and which teams are giving up the fewest high quality chances.
By Richie Eyres