Football Analytics 101: A Beginner’s Guide to the Expected Goals (xG) Metric

New to analytics and data science? Interested in finding out how these metrics work in football? Heard of the term ‘xG’ but aren’t sure how it works? You’ve come to the right place!

Introduction

Analytics is revolutionizing sports as we know it. I’ve been following the analytics trend in American sports for the last decade, thanks largely to Daryl Morey’s “math movement”. It’s awesome to see how teams have changed their strategies, in-game tactics, and even their backroom staff to align themselves with the shifting sands of this revolution.

Football, however, has long fought back against any analytics-driven logic. The usual argument against using or even leaning on analytics is – “that’s not how the beautiful game is played”. It’s about heart and grit, about talent triumphing over what the opposition throws at you, about whiteboard tactics and the “trained eye test”. If we reduced players to numbers on an Excel spreadsheet – what’s to love about sports?

Thankfully, clubs have started to realize how crucial analytics is to staying relevant in today’s quickly evolving world. If you were surprised at Ole Gunnar Solskjaer’s incredible winning start at Manchester United and the hard fall that followed – analytics predicted the regression to the mean well before the losses started piling up.

Source: scisports.com

In this article, we will discuss what Expected Goals (xG) are, why they’re important, and how to calculate them. xG is changing the way we consume football to the extent that even mainstream media outlets are using it. Time to get on board!

 

xG Topics Covered in this Article

  • What are Expected Goals (xG)?
  • The Various Factors that Help us Calculate Expected Goals
  • Why xG is so Important
  • List of In-Depth xG Resources

 

What are Expected Goals (xG)?

Easily the most popular analytics metric being used in football. The Expected Goal metric, or xG, tells us the likelihood of a player scoring a goal based on the situation he/she is in. xG assesses every chance to give us the probability of a shot finding its way into the back of the net.

The higher the xG, the more chance the shot should be a goal. Given that this is a probability number, xG falls between 0 (no chance) and 1 (maximum).

So if xG shows a Harry Kane shot from outside the box is 0.5, it means there’s a 50% likelihood of scoring from that particular situation. Let me explain this using a Premier League match situation:

football analytics xG

That’s Kevin de Bruyne with the goal gaping and a chance to put Chelsea to the sword. We would intuitively think of this as a 1.0 xG, right? There’s no defender even close to him, the goalkeeper has been taken out by the pass, and de Bruyne is too skilled to miss such a presentable opportunity.

xG actually gave de Bruyne a 0.92 chance of putting that away. And as it turns out, he missed. He somehow inexplicably hit the crossbar and Chelsea survived.

 

The Various Factors that Help us Calculate Expected Goals

This is where things get very interesting. There is no one way to calculate xG. Yep, the xG chart you saw on BBC Match of the Day? Or the xG numbers that pop up across the football sites you frequent?

They all have their own way (or model) of calculating xG.

Think about it. How would you define a certain situation on the pitch? Let’s say Paul Pogba (Manchester United) receives the ball 25 yards from goal and is dead straight in front of the keeper:

This is where the complexity comes in. There are four defenders between Pogba and the goalkeeper. Three of them are aiming to make a last ditch block. How in the world would you calculate the xG for this situation?

Is it as simple as calculating the number of shots on target divided by the total shots taken? Well, not quite.

Football is a deeply complex game with multiple actions happening simultaneously. Take the Pogba example from above. State-of-the-art models factor in multiple variables to calculate xG values:

  • The location of where the shot is being taken from
  • The number of defenders around the shooter
  • The positioning of those defenders
  • The action those defenders are taking
  • The quality of the pass to the shooter
  • The positioning of the goalkeeper

And so on. And there are different shot situations to take in as well:

  • Shots from open play
  • Headers from open play
  • Penalties
  • Direct Freekicks
  • Indirect Freekicks
  • Corners
  • Throw-ins
  • Rebound from a save by the goalkeeper
  • Rebound from hitting the woodwork
  • Fast Breaks (counter-attacks)

Are you getting how complex the final xG model can potentially become? These 10 different shot types typically have their own individual xG model.

And adding the probability of the xG of all the shots in a game gives us the final xG value. There’s a catch here that we will discuss in another article about the role of variance in calculating xG. It’s an important topic and I want to dedicate a full article to that. Let’s keep things simple for now.

Check out Tim Cahill’s goal for Millwall against the perennially suffering Sunderland. I want you to focus on the shot situation and the variables we discussed above:

The xG value of Cahill’s shot, according to OptaPro, was 0.09 (aka 9%). There were multiple defenders in front of him including the goalkeeper. It took a shot of precision and quality to place it above all of them into the roof of the net.

Let’s take another example using screenshots. This is a game between Chelsea and Newcastle where Chelsea winger Pedro is put through on goal by this exquisite ball by David Luiz:

He’s in the box and only has the goalkeeper to beat. But the angle is quite narrow (thanks to the keeper’s astuteness in coming out and closing down the angle):

Putting together what we know of xG values, the chances of Pedro scoring are pretty favorable. Opta’s model gives him a 0.35 xG value. This means Pedro is expected to score about 35% of the time. A very high value when it comes to xG. Pedro missed unfortunately but you get the idea.

If you’re interested in building your own model to calculate xG, I have provided a list of resources later in this article to get you started. My aim here is to acquaint you with what xG is and why it has suddenly become the hottest metric in football analytics.

 

Why xG (and Analytics) is so Important in Football

I was initially sceptical about how xG would change the way we see football. But the more I’ve read about it and experimented with it on my own, the more I was drawn to it’s importance. Now every time I see a player take aim from 30 yards, or plant a header from 2, my head starts swirling with xG values.

We might not see the true value of xG straight up as fans but think of it from a football club’s perspective. As a coach, would you be thrilled to see your defensive midfielder take a shot from 30 yards out to the left of goal? Would you not want to maximize your team’s chances of scoring?

Pay attention next time you’re watching your club play. There are certain patterns and situations where a player will try to shoot from again and again over the course of a season. The numbers tell him (and the coach) where the maximum probability of scoring from lies.

Analytics is now pervasive at all the clubs in the top 5 European leagues. The Bundesliga is actually quite far ahead in terms of using analytics to gain an edge on the opposition. And it’s use isn’t just limited to on pitch actions.

Analytics (and xG is one part of an umbrella) is helping clubs redefine their transfer strategy. FC Midtjylland is a Danish club well known for their use of analytics from top to bottom. They invested a lot of money in analytics around 2012-13 to analyze how their players were playing and which players suited their system.

They won the Danish title for the first time in their history in 2014-15, and then again in 2017-18.

Borussia Dortmund are actually one of the leaders in using analytics to improve their scouting system. Players like Shinji Kagawa, Julian Weigl, Nuri Sahin, Mario Gotze, Marco Reus and so on were all signed or kept on thanks to their strong analytics numbers. In fact, Dortmund’s system is so revered in Europe that Arsenal actually paid a huge sum of money to sign their head of scouting Sven Mislintant to bring his expertise to the Emirates Stadium (he has since left due to disagreements with the new board).

Honestly, I could go on and on about examples of analytics improving football but if there’s one takeaway from this article, please make it this – football analytics is transforming the way we know this beautiful game.

 

List of In-Depth xG Resources

Here’s a list of important resources to get you started on the ‘how xG is calculated’ question:

 

I will be back with another article on football analytics soon. Enjoy your journey into the world of xG!