Where can I find xG data for matches?

FBref provides free xG for major leagues. Understat offers detailed xG breakdowns by player and team. StatsBomb provides xG where available. Most professional analysis uses Opta or StatsBomb data as the foundation.

Why do different xG models give different values?

Different models weight variables differently. One model might put more emphasis on defensive pressure, another on shot type. Differences usually range 5-15%, with larger differences indicating substantive methodological differences.

Is xG perfect at predicting goals?

No. xG is a probability estimate. Randomness remains. A team with 3.2 xG might score 2, 3, or 4 goals. Over many matches, xG and goals converge, but individual match variance exists.

How long until actual goals match xG?

Typically 10-15 matches. If a team is consistently over or underperforming xG significantly (2+ goals difference), something structural is happening. Teams should converge over time.

Does xG account for goalkeeper quality?

Basic models don't. Advanced models attempt to with "post-shot xG" (xG values adjusted by where shots go after leaving the shooter's foot). This partially accounts for goalkeeper quality because elite keepers prevent more post-shot goals.

Is xGA (expected goals against) reliable?

Yes. Defensive quality and goalkeeper quality both affect actual goals conceded from xGA. A team with 1.5 xGA per match probably concedes 1.5 goals if luck is normal, whether their goalkeeper is elite or average.

Can I predict xG itself?

Yes. Teams with consistent underlying performance tend to repeat xG output. A team that creates 2.0 xG per match for several seasons typically creates similar xG next season, unless something structural changes (manager change, key player departure).

Expected Goals Models: How AI Calculates xG from Tracking Data

Expected goals (xG) is one of the most important metrics in modern football analysis. Rather than just counting goals scored, xG measures the quality of scoring chances created. Understanding how xG models work reveals why they're valuable for prediction.

The Basic Concept

xG answers the question: "Given the quality of chances a team created, how many goals should they have scored?"

If a team scores three goals from 1.8 xG, they were clinical with their chances. If they score one goal from 3.2 xG, they were wasteful. Over time, actual goals converge on xG. Teams with high xG but low actual goals eventually perform better as luck evens out.

This matters for prediction because xG predicts future results better than actual goals do. A team with 2.1 xG for and 0.9 xG against is likely to win more than a team with 1.5 xG for and 1.3 xG against, regardless of actual match results so far.

How xG Models Assign Value

A modern xG model takes shot information and calculates probability: What's the likelihood this particular shot becomes a goal?

The model considers shot characteristics:

Shot location. Shots from 6 yards out are far more likely to go in than shots from 35 yards. The model maps shot distance and angle. A shot straight at goal from 10 yards might have 15% xG value. A 45-degree angle shot from the same distance might have 8% xG.

Shot type. Headers have lower conversion than open-play shots. Penalties have 75%+ conversion (missing a penalty is rare). Rebounds have higher conversion than first-time attempts. The model accounts for these differences.

Defensive pressure. A shot taken with defenders nearby has lower conversion than an uncontested shot. If tracking data shows a defender within 2 yards, xG is lower than for a similar uncontested shot.

Goalkeeper positioning. Advanced models incorporate goalkeeper position. If the keeper is off their line, conversion probability increases. If the keeper is well-positioned, it decreases.

Shot quality. Some models incorporate shot power and accuracy. A powerful shot from a dangerous location has higher xG than a weak shot from the same location.

The model combines these factors into a probability. A well-struck shot from 10 yards with no defensive pressure might receive 0.28 xG (28% conversion probability). The same shot contested by a defender might be 0.20 xG.

Data Sources for xG Calculation

Modern xG models rely on tracking data, not just summary statistics.

Opta and StatsBomb provide detailed shot information including location (to the yard), shot type, defensive pressure, and even ball speed for many matches. Services like FBref provide xG numbers calculated from this data.

Computer vision and video analysis increasingly automate data collection. Cameras track player and ball positions throughout the match. AI systems then analyse this video to extract shot locations, goalkeeper positioning, and defensive pressure.

Hawk-Eye and ball-tracking technology in some stadiums provides precise ball location. Advanced models incorporate this technology where available, improving accuracy.

The data quality varies. Premier League matches have high-quality tracking data. Lower divisions have less detailed data. Some xG models work with only basic shot location and type, without pressure information. These simpler models have lower accuracy but still outperform just counting goals.

Calibration and Validation

A good xG model is well-calibrated, meaning predicted probability matches actual frequency. If a model assigns 0.15 xG to 100 different shots, it should see roughly 15 goals scored from those 100 shots.

Poor calibration suggests the model is biased. A model systematically overestimating xG (predicting 3.0 xG but teams score 2.2 goals) creates false confidence in attacking quality.

Calibration requires large sample sizes to verify. A single season might show poor calibration by luck. Models validated across 5+ seasons with 1,000+ shots are more trustworthy.

Using xG for Prediction

xG improves predictions in several ways.

Accounting for luck. A team that won 3-0 from 1.2 xG versus a team that lost 0-1 from 2.8 xG probably shouldn't have those results. The win was fortunate, the loss unfortunate. xG reveals actual team quality better than results do.

Form assessment. A team with recent results of W-L-W-L might look inconsistent. But if their xG was consistently high and low, the team is consistent, just unlucky. xG reveals whether form is real or luck-driven.

Relative strength comparison. Rather than comparing teams by points per game, comparing by xG per game removes luck. A team averaging 1.9 xG per match is objectively more dangerous than a team averaging 1.4, regardless of how many goals were actually scored.

Injury and tactical impact. When injuries affect a team, xG changes before goals do. A key striker missing matches reduces xG immediately. The reduced wins follow later. xG serves as an early indicator of team strength changes.

Limitations of xG Models

xG has real limitations worth understanding.

Historical model. xG models are trained on historical data. A novel tactical approach that creates chances in unusual ways might not be reflected in historical xG values. A team using an attacking tactic never seen before before generates xG values based on similar historical tactics.

Goalkeeper impact. Basic xG models don't account for goalkeeper quality. A team with an elite goalkeeper should concede fewer goals from the same xGA as a team with a poor goalkeeper. Advanced models try to adjust for this, but it remains challenging.

Set-piece handling. xG from set-pieces is harder to calculate than open-play xG. Crossing quality, heading ability, and defensive organisation from set pieces vary more than open-play conversion. Models often overestimate xG from crosses.

Pressure accuracy. Measuring defensive pressure from video is subjective. Was a defender 2 yards or 3 yards away? This distinction matters for xG but is hard to quantify consistently. Human error in data collection affects xG accuracy.

High-variance situations. Some football situations (penalties, one-on-ones, immediate rebounds) have inherent variance. xG models struggle more with these because conversion varies wildly.

Expected Assists (xA)

Just as xG measures shot quality, expected assists (xA) measures chance creation quality.

A pass leading to a high-quality shot (high xG) receives high xA. A pass leading to a low-quality attempt receives low xA. A pass leading to a goal receives xA equal to the xG of the resulting shot.

This matters because it reveals creative quality independent of outcomes. A midfielder with 6.2 xA who created few goals might still be performing well creatively. xA shows whether chances are being created, independent of whether teammates finish them.

Why xG Matters for AI Prediction

xG is crucial for prediction models because it reveals true team quality independent of luck.

A model predicting merely from recent goals can be fooled. A team on a lucky run wins more than their xG suggests. A model incorporating xG understands true underlying quality.

Most sophisticated prediction models use xG as a key variable. Rather than predicting "what will the score be," models predict "what will the xG be" and then apply a Poisson distribution to convert xG into goal probability.

This two-step approach is more robust than single-step outcome prediction because xG is more stable and predictable than goals. Goals bounce off posts, keepers make brilliant saves, and other variance occurs. xG, being prediction-based rather than outcome-based, is less affected by variance.

Expected goals (xG) models calculate shot conversion probability based on shot location, type, defensive pressure, and goalkeeper positioning.
Models combine these factors to assign xG values (0-1 probability) to each shot.
A team's total xG is sum of their shots' xG values.
xG reveals actual team quality independent of luck better than goals do.
Data sources include Opta, StatsBomb, and computer vision tracking.
xG models need calibration validation across large samples.
xG improves prediction by accounting for luck, revealing true form, and identifying early changes from injuries or tactics.
Limitations include historical bias, goalkeeper-quality variation, set-piece challenges, and measurement error.
Expected assists (xA) applies similar logic to chance creation.
Most sophisticated prediction models use xG as a key variable because its stability outperforms raw goal counts for prediction.