The Elo rating system is elegant in its simplicity. Originally created for chess, Elo translates to football surprisingly well. Unlike complex machine learning models, Elo is interpretable and requires minimal data. Yet it competes respectably with sophisticated algorithms.
How Elo Works
Elo is fundamentally a system for updating team strength estimates based on match results.
Each team starts with a rating, typically 1600. After each match, ratings update based on the result and pre-match rating difference.
The core formula is simple. If team A (rating 1800) plays team B (rating 1600):
- If A wins: A gains points, B loses points. The expected result occurred, so adjustment is small.
- If A loses: A loses more points, B gains more points. The unexpected result occurred, so adjustment is large.
The exact adjustment depends on the K-factor (how much ratings change) and the rating difference. A team beating a much stronger opponent gains more points than beating a weaker opponent.
If team A (1800) beats team B (1600), A might gain 8 points (1800 becomes 1808) and B loses 8 points (1600 becomes 1592). If A loses to B, A loses 16 points and B gains 16.
The K-factor determines volatility. Higher K-factor means ratings change more dramatically. Chess uses K=16-32. Football sometimes uses K=50-100 because football outcomes are more volatile than chess (more luck involved).
Why Elo Works for Football
Elo's strength is that it naturally adapts.
A newly promoted team starts with a rating appropriate for their division. As they play against new opposition, their rating converges to reflect true strength. A team performing above their rating has their rating increase. A team underperforming has their rating decrease.
This adaptive mechanism requires minimal explicit features. You don't need possession data, xG, player information. You just need match results.
For lower divisions where detailed statistics are unavailable, Elo is attractive because it needs so little data.
Additionally, Elo is stable. A single upset doesn't permanently shift a team's rating. The rating gradually adjusts based on a series of results, resisting noise.
Elo vs Complex Models
How does simple Elo compare to sophisticated machine learning?
In top leagues where detailed statistics are available, good machine learning models (xG-based, ensemble methods) outperform Elo. The additional data provides genuine signal.
In lower divisions where data is sparse, Elo often outperforms or matches complex models. The extra sophistication has nothing to predict from.
Elo also has interpretability advantage. You understand exactly why the model makes predictions. Team A is stronger than team B because their Elo is higher. You can see how ratings evolved over time.
The practical finding: for comprehensive predictions, machine learning wins. For interpretability and lower-division football, Elo is competitive.
Variants and Improvements
Researchers have extended basic Elo to address football-specific issues.
Glicko rating adds uncertainty accounting. Teams you've played many times have more confident ratings than teams you've faced once. This refinement improves prediction, especially for less-tested matchups.
Elo with draws. Basic Elo handles only wins and losses. Football has draws. Models handling draws directly are more appropriate. Methods include: treating draws as 0.5 wins for both teams, or having separate models for win-probability and draw-probability.
Rating by position. Rather than a single Elo per team, calculate separate Elo for attacking and defending. This captures whether a team's strength lies in attacking or defensive solidity.
Elo with home advantage. Adjust Elo calculations to account for home advantage separately. A team's Elo reflects strength in general. Home advantage is applied as an adjustment when predicting. Team A at home is 120 rating points stronger than away.
Time decay. Old matches matter less than recent matches. Rather than weighting all history equally, apply decay so recent results matter more. A match from five years ago contributes less than a match from one month ago.
These variants gradually turn basic Elo into a complex model resembling machine learning approaches.
Using Elo for Prediction
To predict a match using Elo, calculate expected win probability from rating difference.
A 200-point rating difference indicates roughly 75% win probability for the higher-rated team.
The formula depends on the exact Elo variant, but simplicity is the point. You don't need complex computation. Rating difference directly indicates probability.
Combining Elo with home advantage: if team A has 1750 Elo at home and team B has 1650 Elo away, adjust for home advantage (perhaps adding 120 points to A's rating for home), then calculate probability from adjusted difference.
Limitations of Elo
Elo has real limitations.
Slow adaptation. Elo responds to results gradually. If a team suddenly changes manager and tactical approach, their rating catches up only after several matches. Machine learning using detailed statistics can detect changes faster.
Ignores context. Elo doesn't account for injuries, suspensions, or team changes. A team losing their star player sees their Elo unchanged until results decline. A model with explicit injury data adjusts immediately.
Regression to mean. Teams outperforming their Elo are expected to regress. But Elo doesn't distinguish between lucky outperformance and genuine quality improvement. A team beating higher-rated opponents consistently might be genuinely improving, not just lucky. Elo treats both cases identically.
Assumes rating stability. Elo assumes teams' relative strength stays roughly constant. For long-standing clubs this is reasonable. For new teams, recently promoted sides, or teams undergoing major changes, assumptions break down.
Doesn't handle truncation. If you only have recent data (last two seasons), Elo ratings for teams without long histories are unreliable. Machine learning can work with whatever data exists.
When to Use Elo
Use Elo when:
- Simplicity and interpretability are priorities
- Data is limited (lower divisions, recent leagues)
- You want a baseline model to compare others against
- You value understanding over marginal accuracy gains
- Computational resources are limited
Don't use Elo when:
- Rich statistics are available (xG, possession, etc.)
- You need to incorporate real-time news (injuries)
- You're competing in high-stakes prediction environments
- Maximum accuracy is required
Elo as a Component
Sophisticated systems sometimes incorporate Elo as one component among several.
You might combine Elo ratings (capturing long-term relative strength) with recent xG (capturing current form), injury status, and other variables in an ensemble model.
Elo provides stable baseline. Recent form and statistics provide sensitivity to change. Combined, they capture both permanence and change.
In Summary
- Elo rating system is a simple yet effective approach to ranking football teams and predicting outcomes.
- Each team has a rating that updates based on match results.
- Unexpected results cause larger rating changes.
- Elo works well without detailed statistics, making it valuable for lower divisions.
- Complex models outperform Elo in top leagues where rich data is available.
- Variants exist addressing football-specific issues (draws, home advantage, time decay).
- Elo advantages include simplicity, interpretability, and stability.
- Elo limitations include slow adaptation to changes, ignoring injuries, and assumption of rating stability.
- Elo is best used in lower divisions or as a component in ensemble models.
- For top-league prediction with comprehensive data, machine learning models typically outperform basic Elo.
Frequently Asked Questions
What K-factor should I use for football? For top leagues, K=50-100. For lower divisions with more volatility, K=100-150. The higher the K, the more ratings fluctuate. Test different values on your data.
How do I start with teams that have no history? Typically use 1600 as default. For promoted teams, you might use the rating they had in their previous league, adjusting for league level difference.
Should I use Elo if I have detailed statistics available? Probably not as your primary model. Machine learning incorporating statistics outperforms Elo. However, Elo as one component in an ensemble can add value.
Can I combine Elo with home advantage? Yes. Calculate Elo normally, then apply home advantage adjustment (typically 50-150 rating points depending on league) when making predictions.
How long until Elo ratings stabilise? New teams' ratings stabilise after 20-30 matches against diverse opposition. Teams with long histories (20+ seasons) are quite stable. Volatile teams with frequent changes to strategy or squad never fully stabilise.
Should I decay old matches? Yes, ideally. A match from 10 years ago is less relevant than one from 10 weeks ago. Apply exponential or linear decay to older matches' influence.
How accurate is Elo for football? 55-60% on match outcome prediction in top leagues. Good but not exceptional. Machine learning outperforms Elo when data is available. In lower divisions or with limited history, Elo performs reasonably well.
Can Elo predict draws? Basic Elo assumes binary outcomes. Variants exist for draws. One approach: probability of team A win minus probability of team B win equals draw probability. More sophisticated variants model draws separately.

