This project analyzes professional League of Legends match data to understand the factors influencing match outcomes. The dataset includes detailed game statistics, and the analysis focuses on the relationship between gold difference at 25 minutes (golddiffat25
) and match outcomes (result
).
The dataset initially had 116,064 rows and 161 columns. The golddiffat25
column, crucial for analysis, had 23,520 missing values, while result
had none. Dropping rows with missing golddiffat25
reduced the dataset to 92,544 rows. Both columns were converted to numeric types for compatibility, ensuring accurate and reliable analysis with complete data.
golddiffat25 result
180 1928.0 1
181 2943.0 1
182 660.0 1
183 5016.0 1
184 2194.0 1
A histogram of golddiffat25
shows a roughly normal distribution centered around zero, with most values between -5,000 and 5,000. This indicates that games are generally balanced at 25 minutes, with extreme gold differences being rare, reflecting the competitive nature of League of Legends.
Scatter and box plots reveal a positive relationship between golddiffat25
and result
. Winning teams (result = 1
) typically have positive gold differences, while losing teams (result = 0
) have negative or neutral ones, highlighting the importance of early-game gold leads in match outcomes.
The dataset was grouped by match outcome (result
), and the mean gold difference for each outcome was calculated. Teams that won their matches had an average gold lead of 1,511.96, while losing teams had an average gold deficit of -1,511.96. This significant difference reinforces the idea that gold advantage at 25 minutes is a key indicator of success in League of Legends matches.
result golddiffat25
0 0 -1511.962828
1 1 1511.962828
Data Processing & Distribution
Missing Values
golddiffat25
Gold Difference Distribution
This scatter plot visualizes the relationship between the gold difference at 25 minutes (golddiffat25
) and the match outcome (result
), where the outcome is represented as a binary variable:
1
indicates a win,0
indicates a loss.result = 1
).result = 0
).golddiffat25
axis, reflecting that many matches remain relatively close at the 25-minute mark.golddiffat25
) for winning (result = 1
) and losing (result = 0
) teams in League of Legends matches. The calculated correlation between golddiffat25 (gold difference at the 25-minute mark) and result (match outcome) is 0.478. This indicates a moderate positive correlation between the two variables. Here’s an analysis of what the graph tells us:result = 1
) is entirely above the zero line, indicating that winners tend to have positive gold differences at the 25-minute mark.result = 0
) is centered below zero, showing that losers often have negative or neutral gold differences.This analysis highlights the importance of early-game gold advantages in League of Legends. The findings suggest that teams with a higher gold difference at 25 minutes are significantly more likely to win. This insight provides valuable strategic implications for teams and fans alike, emphasizing the importance of early-game performance in professional play.
The prediction problem involves determining whether a team will win (result = 1
) or lose (result = 0
) a League of Legends match based on in-game metrics available at the 25-minute mark. This is a binary classification problem, as the response variable (result
) has two possible outcomes: win or loss.
The response variable is result
, which indicates whether a team won (1
) or lost (0
) the match. This variable was chosen because it represents the ultimate outcome of the game, and understanding how early-game metrics impact this outcome can provide valuable insights for strategy development.
The features used to train the model include:
golddiffat25
: Gold difference at the 25-minute mark (quantitative).xpdiffat25
: Experience difference at the 25-minute mark (quantitative).killsat25
: Number of kills achieved by the team at the 25-minute mark (quantitative).deathsat25
: Number of deaths suffered by the team at the 25-minute mark (quantitative).Model Evaluation F1-score is used over accuracy to handle potential class imbalance in win/loss predictions.
Features Only includes data available at 25 minutes to maintain prediction validity and prevent data leakage.
Significance Helps teams optimize early-game strategies by identifying key performance metrics that influence match outcomes.
Baseline Model Using Logistic Regression via scikit-learn Pipeline for binary win/loss classification. Performance
True Losses (TN): 10,417 True Wins (TP): 10,562 False Positives: 3,465 False Negatives: 3,320
Gold difference at 25 minutes proves to be a strong predictor, though misclassification errors suggest room for improvement.
True Labels | 0 | 1 |
---|---|---|
0 | 10417 | 3465 |
1 | 3320 | 10562 |
Grouped Statistics: | result | golddiffat25 | |———:|—————:| | 0 | -1511.96 | | 1 | 1511.96 |
Gold Difference Impact
This clear separation confirms gold difference at 25 minutes as a strong predictor of match outcomes.
The model uses the following features:
golddiffat25
(Quantitative):
This feature represents the gold difference at the 25-minute mark, providing a measure of the team’s economic advantage or disadvantage during the early to mid-game phase.
xpdiffat25
(Quantitative):
This feature represents the experience difference at the 25-minute mark, giving insight into the team’s relative level advantage or deficit compared to their opponent.
Both features are quantitative and continuous. Since there were no ordinal or nominal features, encoding was not required. The features were scaled using StandardScaler within the pipeline to normalize their ranges and ensure improved logistic regression performance.
Model Performance on Test Set (30% of data) Model shows balanced performance across metrics and classes.
Baseline Model Performance: Accuracy: 0.7556187869183115
Classification Report: precision recall f1-score support
0 0.76 0.75 0.75 13882
1 0.75 0.76 0.76 13882
accuracy 0.76 27764 macro avg 0.76 0.76 0.76 27764 weighted avg 0.76 0.76 0.76 27764
Model Quality Assessment
golddiffat25
, xpdiffat25
) align with LoL game mechanics and are available at prediction timeModel Limitations
Improvements
Model Parameters & Performance
Feature Engineering
Feature Importance
Original features remain strongest predictors, suggesting engineered features may be redundant.
Accuracy: 0.7539979829995678
Classification Report: precision recall f1-score support
0 0.75 0.76 0.76 13882
1 0.76 0.75 0.75 13882
accuracy 0.75 27764 macro avg 0.75 0.75 0.75 27764 weighted avg 0.75 0.75 0.75 27764
golddiffat25: 0.4628
xpdiffat25: 0.3777
killsat25: 0.1595
deathsat25: 0.0000
What went wrong The original features likely dominate because:
This suggests early-game resource advantages are more predictive than how teams acquired them.
The final model utilized a Random Forest Classifier, chosen for its ability to handle non-linear relationships and feature interactions while being resistant to overfitting through ensemble learning.
max_depth
: 10min_samples_leaf
: 1min_samples_split
: 2n_estimators
: 100These hyperparameters were selected using GridSearchCV with 3-fold cross-validation, exploring a comprehensive parameter space to find the optimal configuration.
Final Assessment Though accuracy remained similar, final model offers:
Makes it more reliable for practical game prediction despite minimal accuracy gains.
Feature Importance
True Negatives (Losses correctly predicted): 10,550 False Positives: 3,332 False Negatives: 3,472 True Positives (Wins correctly predicted): 10,410 This balanced distribution suggests the model performs similarly well for both winning and losing predictions, without significant bias toward either outcome.
kills_deaths_ratio
: While intended to capture team fight efficiency, this feature ultimately did not contribute significantly to the model’s performance, as shown by the feature importance analysis.gold_xp_interaction
: Similarly, this engineered feature did not appear as significant in the final model, suggesting that the raw gold and XP differences were more informative.max_depth = 10
min_samples_leaf = 1
min_samples_split = 2
n_estimators = 100
This project explored predicting League of Legends match outcomes using data at the 25-minute mark. While our feature engineering attempts (kills/deaths ratio and gold/XP interaction) were theoretically sound, the empirical results showed that the original features were most predictive.
This analysis suggests that in professional League of Legends, resource advantages at 25 minutes are more reliable predictors of game outcomes than combat statistics or engineered feature combinations. The high ROC-AUC score indicates that the model makes reliable probabilistic predictions, even though the accuracy remained similar to the baseline model.