Soccer Handicapping Analytics: Expected Goals (xG)

by Hollywood Sports

Analytics can be a powerful tool in the handicapper toolbox when assessing potential value versus bookmaker odds. While statistical analysis has existed with sports since someone started keeping score, the analytics movement examines data to foster a better understanding of the sports we study and follow rather than relying solely on traditional statistics. Often this data can be more predictive of future activity and results than the conventional statistics. This offers exciting possibilities for sports bettors with the opportunity to deploy more accurate predictive data that a majority of the bettors in the market are not using. In baseball, the two most prevalent statistics associated with starting pitchers are Win-Loss record and Earned Runs Average. A statistical analysis of Win-Loss record determined that those numbers had little predictive value for that starting pitcher’s future performance. Data analysis went even deeper to discover that Fielding Independent Performance data such as strikeouts, walks, and batted ball activity offer a more accurate perspective of how a starting pitcher will perform in the future. Statistics such as FIP, xFIP, and SIERA are attempts to provide more accurate descriptive and predictive measurements of how many runs a pitcher allows. 

In soccer, the idea of expected goals serves a similar vision. Goals scored and goals allowed may be definitive in determining a final score but that does not mean that those numbers are the most predictive regarding future scores. What are other statistics that are important in scoring goals? Most goals are scored by shooting at the net in open play (with exceptions being penalty kicks and opponents scoring own goals which do not appear to be reliable events that can be created without luck and the random behavior of an opponent). The more shots a team takes at the net, the more likely they will score. And the better quality of these shots, the more likely they will get past a keeper. Expected goals is a metric that determines a statistical probability on every scoring chance a team generates in a match. In this adventure of quantitative analysis, similar scoring situations are logged to determine a scoring probability from a deep data set in a way similar to measurements that predict the accuracy of an NBA shooter attempting a 22-foot corner 3-pointer. Shot attempts that have an empirical success rate of 35% or higher have been categorized as Big Chances. By reassessing a soccer match from the expected goals (xG) and expected goals allowed (xGA) given the activity and nature of all the shot attempts in a match. If xG analysis offers a better evaluation regarding how a team is playing, then it could provide a more precise way to measure subsequent action. For example, Southampton entered match week 32 of the 2019-20 English Premier League season with 38 goals scored. However, their xG of 44.20 suggested that they should have scored at least six more goals on the season given the average likelihood of events regarding their scoring opportunities. Bettors that decided that this information was evidence of the Saints covering the goal-line spread with their match at Watford or that the final score would finish over the 2.5 total were rewarded with Southampton’s 3-1 victory. Armed with expected goals and expected goals allowed data for both teams in an upcoming match can offer handicappers a powerful weapon in exposing the hidden value against the posted side and totals numbers of the bookmaker. 

But these potential strengths of using expected goals data do come with some caveats. There are some disadvantages to relying on expected goals data exclusively. For starters, one should not consider this objective data. At the beginning of the statistical endeavor, there is a human being assessing and categorizing shot attempts (even if eventually this analysis is then replicated by artificial intelligence). The mathematical formulas are all creations by human beings that are deployed in the quantitative analysis. As long as we live in a pre-Singularity world, this phenomenon is inevitable. And it is ok! Just remember that with the human eye and the touch comes the possibility of human error. There are competing expected goals systems in the marketplace. While ERA and field goal percentage are agreed upon statistics, xG remains a proprietary activity with different agents developing and propagating their numbers. 

Second, the concept of overachieving or underachieving can be misused. Expected goals attempt to determine the most likely outcomes. But not all outcomes are created equal. Lionel Messi is going to score more goals than Glenn Murray dribbling up the left-wing and talking a shot from 30 yards out. Ederson is more likely to make a spectacular save in that situation than Tom Heaton. While xG attempts to minimize outlier efforts, some players have earned their outlier status on both ends of the equation. Betting against Real Madrid (or taking more Unders) because their number of goals scored seems to be overachieving their expected goals may be foolhardy because they have Lionel Freaking Messi! Similarly, banking on bad teams to start playing closer to their expected points calculation (xPTS — a formula attempting to incorporate xG and xGA to reproduce their expected points for the season) may be foolhardy because that team may truly embody the outlier bad xG and xGA numbers. 

Third, be careful to not confuse recent results as overachievement (or underachievement) when what may be going on is the in-season improvement (or decline) of a team’s quality of play. Teams do get better (or worse) as the season moves forward. Coaching matters. Players improve. Injuries sometimes have disproportionate impacts. Teams can suffer from a loss of morale. An assumption in analytics that attempts to describe past results for predictive value moving forward is that those past results remain a credible assessment of the team’s quality. Yet team quality can be fluid. 

Fourth, regression to the mean is a long-term expectation so finding discrepancies between current results with expected goals results may not immediately produce dividends. Be patient. And remember what John Maynard Keynes said about the long-run (to paraphrase, we are all dead). Waiting for what may seem to be inevitable regression can be Fool’s Gold. 

Last, keep in mind that because the margins are thinner in soccer, the impact of expected goals is smaller. In basketball, identifying discrepancies between an expected score and a projected score using Points-Per-Possession analytics can be more fruitful since a college basketball game averages around 130 combined points per game with an NBA averaging over 200 combined points per game. Because soccer generally sees one zero to six combined goals scored per match, there are fewer scoring opportunities for which the discrepancies exposed via expected goals analysis translates into an actual difference in score. Your team can dominate their opponent on the pitch but still settle for a 1-1 draw. Because there are more scoring opportunities in basketball, the expected value identified from Points-Per-Possession analysis has more opportunities to demonstrate itself. 

These caveats aside, expected goals is a valuable tool to help the handicapping of soccer. Despite Liverpool winning the 2019-20 English Premier League championship, the xPTS analysis still projects Manchester City to be the better team this season. Those of us that used that information to help to conclude side with Man City in their July 2nd meeting were rewarded with a 4-0 victory. Relying on expected goals analysis alone will probably not be profitable. However, adding expected goals into the array of angles from which to determine value relative to the numbers that the bookmakers have posted should make successful soccer handicapping even more lucrative. Best of luck for us — Frank.

All photographic images used for editorial content have been licensed from the Associated Press.

© 2023 Al McMordie's All Rights Reserved.