Can I use Poisson regression myself?

Yes. Python libraries like scipy.stats and statsmodels make it accessible. You need historical match data, feature engineering, and understanding of the maths. Start with an article walking through implementation.

Is Poisson regression still used by professionals?

Yes, despite being decades old. Most sophisticated models use Poisson as a component. Hybrid approaches use Poisson for goal prediction combined with other models for broader analysis.

Why don't models use normal distribution instead?

Normal distribution allows negative numbers (negative goals). Poisson restricts to non-negative integers (0, 1, 2, 3...) matching real football. Additionally, normal distribution assumes unbounded variance. Poisson variance equals the mean, which empirically matches goals better.

How accurate are Poisson probabilities?

Very accurate overall. Individual match prediction is 55-58% accurate in top leagues using basic Poisson. This isn't amazing but is meaningful. Poisson gets most of the variance explained, with other factors responsible for the remaining 42-45% of variance.

What if my predicted lambdas are wrong?

Then your probabilities are wrong too. The Poisson formula is mathematically correct given lambdas. The challenge is predicting accurate lambdas. Bad lambda prediction is the most common source of Poisson model failure.

Can Poisson handle draws?

Yes, though not explicitly. Draws are modelled as a range of low-scoring outcomes (0-0, 1-1, sometimes 2-2). By calculating probabilities across all score lines and grouping draws together, Poisson handles draws well.

Should I use Poisson for over-under betting?

Excellent fit. Poisson naturally predicts goal totals. Sum Poisson probabilities for all scores with 2+ total goals to predict probability of over 2.5. This is where Poisson shines most.

Poisson Regression in AI Football Models: The Foundation

Poisson regression is the statistical foundation underneath many football prediction models. Despite sounding technical, the concept is elegant: goal distribution in football follows a mathematical pattern called the Poisson distribution. Understanding this pattern unlocks why many models work.

The Poisson Distribution Explained

The Poisson distribution describes the probability of a certain number of events occurring in a fixed interval when events occur randomly and independently.

In football, we observe that goals happen roughly randomly throughout matches. A team might score 0, 1, 2, 3, or more goals. These outcomes follow a predictable pattern.

If a team has a 60% chance to win today, they don't have 60% chance to win every match (variability exists), but across many similar matches, they win 60% of the time. This randomness with underlying consistency is Poisson distribution.

The mathematical formula depends on a single parameter: lambda (the average number of events). If a team scores an average of 1.4 goals per match, knowing that tells you the entire probability distribution. The probability they score 0 goals is roughly 25%, 1 goal is roughly 35%, 2 goals is roughly 24%, 3+ is roughly 16%.

These probabilities are determined entirely by the average (lambda). You don't need to know anything else about the team. This is remarkable. If you know a team's average goals scored and goals conceded, you can calculate goal probabilities.

Why Goals Follow Poisson Distribution

Football goals follow Poisson patterns for mathematical reasons.

Goals are roughly independent events. One team's goal doesn't significantly increase or decrease the probability of another goal (though momentum effects exist, they're small). When events are independent and random, the Poisson distribution describes them.

Goals happen roughly equally throughout the match. They're not clustered in first minutes or final moments (though injury time effects exist). When events are distributed uniformly through time, Poisson describes them.

These conditions aren't perfect in football. Momentum exists. Tactical changes mid-match shift probability. Injury time leads to more goals. But these effects are small enough that Poisson remains remarkably accurate. Empirically, goal distributions closely match Poisson predictions.

Testing this is straightforward. Take actual Premier League matches from a season. Calculate the average goals per match. Generate Poisson probabilities from that average. Compare predicted to actual distribution. The match is uncanny.

This empirical fit is why Poisson regression became the foundation of football prediction. It works because it accurately describes reality.

How Poisson Regression Works for Football

Poisson regression is a method for predicting the lambda (average) for each team.

Start with historical match data: team A scored 2 goals, team B scored 1 goal. Team A was at home. Team B was on poor form. You want to build a model predicting goals.

Poisson regression finds the relationship between variables (home, form, defensive quality) and goals scored. It fits a line (or curve) through your data, learning which factors correlate with higher or lower goal counts.

The model output is a lambda value. For a match between team X at home versus team Y away, the model might predict team X will score an average of 2.1 goals. That doesn't mean team X will definitely score 2.1 (they'll score 0, 1, 2, 3, etc.). But over many similar situations, they average 2.1.

With predicted lambdas for both teams, you calculate goal probabilities using the Poisson formula. Home team 2.1 goals lambdas and away team 0.9 goals lambda generates probabilities for each score (0-0 is roughly 12%, 1-0 is roughly 26%, 2-0 is roughly 27%, 1-1 is roughly 8%, etc.). You sum these to get win/draw/loss probabilities.

Predicting Lambdas: The Real Challenge

Poisson regression is straightforward once you have accurate lambdas. The real work is predicting what lambda will be.

Key variables for predicting goals scored include:

Attacking quality. Recent shots created, xG, historical goals per match. Better attackers create and convert more.

Defensive opposition. Opposition defensive record, xGA conceded recently, historical goals conceded. Playing stronger defence reduces goals.

Home advantage. Home teams score roughly 10-15% more goals. Home lambda is 15% higher than away lambda for the same team.

Recent form. Teams in good form score more. Recent goals per match predicts near-future goals better than season-long average.

Player injuries. Missing key attackers reduces goals scored. Missing key defenders reduces goals conceded.

Rest days. Teams with more rest days score more goals. Less rest reduces lambda.

Tactical approach. Attacking teams have higher lambdas. Defensive teams have lower lambdas.

Building a good Poisson model means finding the right combination of these variables and their relative importance. A simple model using just attacking and defensive quality might achieve 50% prediction accuracy. Adding home advantage, form, and injuries improves it to 55%. Adding subtle factors like rest and crowd size might push it to 57%.

Calibration and Accuracy

A properly calibrated Poisson model's probabilities match reality. If the model says a team has 65% win probability, they should win roughly 65% of the time when facing similar situations.

Testing calibration requires large samples. Track all matches where the model gave 65% win probability. Did the team win 65% of them? If yes, the model is well-calibrated at that probability level. Do this across all probability ranges.

Poorly calibrated models are overconfident or underconfident. An overconfident model gives high probability to outcomes that don't materialise. An underconfident model is too cautious.

Recalibration improves models. You can adjust probabilities after observing systematic bias. If your model says 65% win probability but teams actually win 70% in those situations, you adjust the model to account for this.

Limitations of Poisson Regression

Poisson regression assumes goals happen independently and randomly. This isn't entirely true in football.

Momentum effects. After a team scores, they're sometimes stronger (confidence boost) and sometimes weaker (opposition forced to open up). Momentum effects break independence assumptions.

Tactical responses. Teams respond to being behind by attacking more. This changes the distribution of goals across match minutes. Early goals matter differently than late goals because they alter tactics. Poisson treats all goals equally regardless of timing.

Rare outcomes. Poisson struggles with very rare outcomes (5+ goals). Historically, 5+ goal matches happen less often than Poisson predicts. This suggests outlier events exist that Poisson doesn't capture.

Defensive differences. Poisson assumes defensive quality only affects opponent's goals. But a strong defence affects league-wide patterns. If all defending improves, the relationship between attacking quality and goals changes.

Non-stationarity. Football evolves. The relationship between possession and goals, between xG and goals, between time of year and goals all change over seasons. Poisson models need retraining to stay current.

Extensions Beyond Basic Poisson

To address limitations, researchers extend basic Poisson.

Negative binomial regression allows for more variance than Poisson predicts. Some matches are more volatile than basic Poisson suggests. Negative binomial accommodates this.

Zero-inflated Poisson acknowledges that 0-0 draws happen more often than Poisson predicts. Zero-inflation adds probability mass to the zero outcome.

Bayesian Poisson incorporates prior beliefs. Rather than just fitting historical data, Bayesian approaches let you include domain knowledge. You might have prior belief that possession matters less than xG does, and Bayesian methods incorporate this.

Mixture models combine multiple Poisson distributions. Different match types (high-scoring derbies versus defensive battles) might follow different distributions. Mixture models accommodate this.

Hurdle models split prediction into two stages: probability of a goal happening (0 vs 1+), then given goals happen, how many. This sometimes fits football better than pure Poisson.

These extensions improve accuracy modestly. The basic Poisson framework works well enough that fancy extensions rarely justify their added complexity for practical prediction.

Poisson regression models goal distribution in football as a random process following the mathematical Poisson distribution.
The distribution depends on a single parameter (lambda, the average goals) which is predicted from variables like attacking quality, defensive opposition, home advantage, form, and injuries.
Once you predict lambdas for both teams, you calculate win/draw/loss probabilities by summing Poisson probabilities across different final scores.
This approach is foundation for most football prediction models because it's elegant, mathematically sound, and empirically accurate.
Limitations include assumption of independence (momentum effects violate this), poor handling of rare outcomes, and non-stationarity (football evolves).
Extensions like negative binomial, zero-inflation, and Bayesian approaches address some limitations.
For most practical applications, basic Poisson regression works well enough to justify its simplicity.