Barrels Are All You Need – Running on Numbers

In order to buy runs, you need to buy barrels.

People who run ball clubs, they think in terms of buying players. Your goal shouldn’t be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011)

In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team’s offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs.

Code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings

from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import confusion_matrix, log_loss, ConfusionMatrixDisplay, roc_curve, roc_auc_score, RocCurveDisplay
from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

warnings.filterwarnings('ignore')
sns.set_theme(style="whitegrid", palette="deep")
pd.set_option('display.max_columns', None)

df = pd.read_csv("team-games-2023-2025-with-runs.csv")
# convert "Oakland Athletics" to "Athletics"
df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics")
sns.barplot(data=df, x="barrels_total", y="runs_scored")
plt.xlabel("Total Barrels in Game")
plt.ylabel("Runs Scored in Game")
plt.show()

Figure 1: Runs Scored vs Total Barrels in Game (2023-2025)

To visualize their impact, let’s look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe Figure 1. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team’s offensive output and ultimately winning games.

In addition to runs scored, barrels indicate overall offensive performance. In Figure 2, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There’s probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact.

Code

df_barrels = pd.read_csv("barrels-2023-2025.csv")
df_barrels["barrel"] = 1
df_barrels["contact_quality"] = "barrel"

df_weak = pd.read_csv("poor-contact-2023-2025.csv")
df_weak["barrel"] = 0
df_weak["contact_quality"] = "poor"

df_solid = pd.read_csv("solid-contact-2023-2025.csv")
df_solid["barrel"] = 0
df_solid["contact_quality"] = "solid"

df_bbe = pd.concat([df_barrels, df_weak, df_solid], ignore_index=True)
sns.kdeplot(df_bbe, x="estimated_woba_using_speedangle", hue="contact_quality", fill=True, common_norm=False, alpha=0.5)
# rename legend
plt.legend(title="Contact Quality", labels=["Solid Contact", "Poor Contact", "Barrel"])

Figure 2: xwOBA by Contact Quality (2023-2025)

So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly.

How to Find Barrels

Statcast defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let’s visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics.

Code

sns.scatterplot(data=df_bbe, x="launch_speed", y="launch_angle", hue="contact_quality", alpha=0.5)

Figure 3: Batted Balls by Contact Quality (2023-2025)

Figure 3 shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact.

To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors.

# batting features only
features = ["bat_speed", "attack_angle", "swing_length", "attack_direction", 
            "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches",
            "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat",
            "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number"]

dataset = df_bbe[features + ["barrel"]].copy()
dataset["barrel"] = pd.factorize(dataset["barrel"])[0]
dataset["stand"] = pd.factorize(dataset["stand"])[0]
X = dataset[features]
y = dataset["barrel"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# modeling
pip = Pipeline([
    ("imputer", SimpleImputer()),
    ("scaler", RobustScaler()),
    ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced"))
])
pip.fit(X_train, y_train)
y_pred = pip.predict(X_val)
score = pip.score(X_val, y_val)
y_proba = pip.predict_proba(X_val)
print(f"Validation score (Accuracy): {score:.4f}")
print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")

kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1)
print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV accuracy: {cv_scores.mean()}")

Validation score (Accuracy): 0.6564
Log Loss: 0.6006
Cross-validation scores: [0.66854507 0.67285075 0.67464693 0.66929524 0.6735115 ]
Mean CV accuracy: 0.6717698965985912

Just on batting features alone, we can achieve a validation accuracy of 66%. The average cross-validation accuracy is around 67%. While not perfect, this model provides a better-than-random ability to classify barrels based on batting metrics.

Now it takes two to tango. We should also consider the pitcher’s influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model’s ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production.

# batting and pitching features only
features = ["batter", "pitcher", "bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches",
            "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat",
            "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number", 
            "release_speed", "release_pos_x", "release_pos_z", "p_throws", "zone", "vx0", "vy0", "vz0", "ax", "ay", "az", "release_pos_y", "pitch_type", "pitch_number", "age_pit", "api_break_z_with_gravity", "api_break_x_arm", "api_break_x_batter_in",
            "arm_angle", "zone", "effective_speed", "release_spin_rate", "release_extension"]

dataset = df_bbe[features + ["barrel"]].copy()
dataset["barrel"] = pd.factorize(dataset["barrel"])[0]

categorical_features = ["stand", "p_throws", "pitch_type"]
dataset_one_hot = pd.get_dummies(dataset, columns=categorical_features, drop_first=True)

y = dataset_one_hot["barrel"]
X = dataset_one_hot.drop("barrel", axis=1)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# %% modeling
pip = Pipeline([
    ("imputer", SimpleImputer()),
    ("scaler", RobustScaler()),
    ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced"))
])
pip.fit(X_train, y_train)
y_pred = pip.predict(X_val)
score = pip.score(X_val, y_val)
y_proba = pip.predict_proba(X_val)
print(f"Validation score (Accuracy): {score:.4f}")
print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")

kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1)
print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV accuracy: {cv_scores.mean()}")

Validation score (Accuracy): 0.6737
Log Loss: 0.5821
Cross-validation scores: [0.68336082 0.68811475 0.68566822 0.68072519 0.68566854]
Mean CV accuracy: 0.6847075037611322

With both batting and pitching features, the validation accuracy improves to 67% and the average cross-validation accuracy increases to around 68%. This is effectively a 1% improvement over batting features alone. There is some added value but not a huge leap. What would be interesting is if we could use the entire bat trajectory data to further improve classification, but that is a topic for another day.

Model Evaluation

I would be remiss if I did not include some model evaluation metrics. First, let’s look at the confusion matrix to see how well our model is classifying barrels versus non-barrels.

Code

y_pred = pip.predict(X_val)
cm = confusion_matrix(y_val, y_pred)

cm_display = ConfusionMatrixDisplay(cm).plot()
plt.grid(False)

Figure 4: Confusion Matrix for Barrel Classification

The majority of predictions are correct. There are a lot of false negatives. This indicates that the model is struggling to identify barrels when they do occur. Although there’s a great deal of data imbalance (many more non-barrels than barrels). In such cases, accuracy alone can be misleading. Therefore, let’s also look at the ROC curve to evaluate the model’s performance across different classification thresholds.

Code

y_score = pip.decision_function(X_val)
fpr, tpr, _ = roc_curve(y_val, y_score, pos_label=pip.classes_[1])
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()
# area under the curve
auc = roc_auc_score(y_val, y_score)
plt.title(f"ROC Curve (AUC = {auc:.4f})");

Figure 5: ROC Curve for Barrel Classification

When observing the ROC curve, we see the model performing better than random guessing. The area under the curve (AUC) is a useful metric for summarizing the model’s performance across all classification thresholds. An AUC of 0.75 indicates that the model has a good ability to distinguish between barrels and non-barrels across various thresholds.

Who Should Find Barrels?

Finally, let’s look at which teams are generating the most barrels on average per game. This can provide insights into which teams are effectively leveraging barrels to enhance their offensive performance.

Code

from IPython.display import Markdown
from tabulate import tabulate

tbl = df.groupby("team_name")["barrels_total"].mean().sort_values(ascending=False)
tbl_df = pd.DataFrame(tbl).reset_index()
tbl_df.index += 1
tbl_df.columns = ["Team", "Avg Barrels per Game"]
Markdown(tabulate(tbl_df.round(4), headers="keys"))

	Team	Avg Barrels per Game
1	Atlanta Braves	2.5783
2	New York Yankees	2.5678
3	Los Angeles Dodgers	2.3291
4	New York Mets	2.2746
5	Seattle Mariners	2.2129
6	Minnesota Twins	2.1535
7	Chicago Cubs	2.125
8	Philadelphia Phillies	2.1232
9	Baltimore Orioles	2.1185
10	Boston Red Sox	2.1
11	Texas Rangers	2.0541
12	Los Angeles Angels	2.0333
13	Houston Astros	2.0273
14	Toronto Blue Jays	2.0083
15	Detroit Tigers	1.9896
16	Kansas City Royals	1.9771
17	San Francisco Giants	1.9353
18	San Diego Padres	1.931
19	St. Louis Cardinals	1.929
20	Arizona Diamondbacks	1.925
21	Athletics	1.9208
22	Tampa Bay Rays	1.8812
23	Miami Marlins	1.8625
24	Colorado Rockies	1.8264
25	Pittsburgh Pirates	1.815
26	Milwaukee Brewers	1.7635
27	Chicago White Sox	1.7484
28	Cincinnati Reds	1.6674
29	Washington Nationals	1.6286
30	Cleveland Guardians	1.4635

Figure 6: Average Barrels per Game by Team (2023-2025)

When looking at Figure 6, it is apparent why BaseballSavant lists barrels per plate appearance. Calculating average barrels per game is too granular where difference between teams is minimal. However, barrels per game still shows some trends among teams. For example, the Braves, Yankees, and Dodgers being in the top 3 checks out. Additionally, the Guardians, Nationals, and Reds in the bottome 3 also makes sense.

The Guardians are an interesting case study. Despite boasting the lowest barrel rate in 2025, they still found themselves in the playoffs. So how did they do it? Being good at pitching always help. Exceptional fielding also plays a role. But one metric the Guardians excel at is maximize their Pull AIR %. In 2025, 21.6% of their batted balls were pulled and either a fly ball, line drive, or pop up. This was the second highest in MLB, behind the Chicago Cubs.

This strikes me as interesting because a low barrel rate with a high Pull AIR % seems counterintuitive. However, it just goes to show that there are multiple ways to win in baseball. The Guardians found a way that worked for them. They are not sluggers but opportunists.

Take a look at the Guardians’ leverage-based run value by batted ball type in 2025. All batters produced negative run values. But observe the “+/- Due To Leverage” column. For all but 2 players, the leverage adjustment was positive on swings. This means that the Guardians were able to get the most out of their batted balls in high-leverage situations, even if those batted balls were not barrels. By focusing on situational hitting and maximizing the impact of their contact, the Guardians were able to overcome their low barrel rate and still find success.

Conclusion

Barrels are the gospel of Statcast — a neat, data-driven way to explain why some swings change games and others don’t. Whether you roll your eyes at analytics or worship at its altar, barrels have become the sport’s shorthand for power. They correlate neatly with runs, wins, and the illusion of control. Teams build models around them, tuning every motion in the cage toward the perfect batted-ball outcome.

But Cleveland seems to have missed that memo—or maybe they just ignored it. The Guardians aren’t chasing barrels; they’re chasing moments. They don’t overwhelm you with exit velocity. They outlast you with timing. While other teams obsess over maximizing run differentials, Cleveland’s figured out that you only need enough runs to win this game. Score when it counts, defend when it matters, and trust that efficiency can be its own kind of power.

In an era defined by data, the Guardians are proof that there’s still room for something harder to measure: the knack for being good at the right time.

Note

Past articles: - Principal Component Analysis - Support Vector Machine - K-Means Clustering

Github: - Running on Numbers

--- title: "Barrels Are All You Need" author: "Oliver Chang" email: oliverc1622@gmail.com date: 2025-10-23 # Update this date when you make changes categories: [Guardians, gradient boosting, classification] toc: true format: html: html-math-method: katex code-tools: true image: "peter-brand.png" bibliography: references.bib title-block-banner: default --- ![](peter-brand.png) ## In order to buy runs, you need to buy barrels. > People who run ball clubs, they think in terms of buying players. Your goal shouldn't be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011) In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team's offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs. ```{python} #| code-fold: true #| warning: true #| label: fig-runs-vs-barrels #| fig-cap: "Runs Scored vs Total Barrels in Game (2023-2025)" #| fig-alt: "Runs Scored vs Total Barrels in Game (2023-2025)" import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np import warnings from sklearn.impute import SimpleImputer from sklearn.model_selection import train_test_split from sklearn.ensemble import HistGradientBoostingClassifier from sklearn.metrics import confusion_matrix, log_loss, ConfusionMatrixDisplay, roc_curve, roc_auc_score, RocCurveDisplay from sklearn.model_selection import KFold, cross_val_score from sklearn.pipeline import Pipeline from sklearn.preprocessing import RobustScaler warnings.filterwarnings('ignore') sns.set_theme(style="whitegrid", palette="deep") pd.set_option('display.max_columns', None) df = pd.read_csv("team-games-2023-2025-with-runs.csv") # convert "Oakland Athletics" to "Athletics" df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics") sns.barplot(data=df, x="barrels_total", y="runs_scored") plt.xlabel("Total Barrels in Game") plt.ylabel("Runs Scored in Game") plt.show() ``` To visualize their impact, let's look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe @fig-runs-vs-barrels. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team's offensive output and ultimately winning games. In addition to runs scored, barrels indicate overall offensive performance. In @fig-xwoba-by-contact-quality, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There's probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact. ```{python} #| code-fold: true #| warning: true #| label: fig-xwoba-by-contact-quality #| fig-cap: "xwOBA by Contact Quality (2023-2025)" #| fig-alt: "xwOBA by Contact Quality (2023-2025)" df_barrels = pd.read_csv("barrels-2023-2025.csv") df_barrels["barrel"] = 1 df_barrels["contact_quality"] = "barrel" df_weak = pd.read_csv("poor-contact-2023-2025.csv") df_weak["barrel"] = 0 df_weak["contact_quality"] = "poor" df_solid = pd.read_csv("solid-contact-2023-2025.csv") df_solid["barrel"] = 0 df_solid["contact_quality"] = "solid" df_bbe = pd.concat([df_barrels, df_weak, df_solid], ignore_index=True) sns.kdeplot(df_bbe, x="estimated_woba_using_speedangle", hue="contact_quality", fill=True, common_norm=False, alpha=0.5) # rename legend plt.legend(title="Contact Quality", labels=["Solid Contact", "Poor Contact", "Barrel"]) ``` So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly. ## How to Find Barrels [Statcast](https://www.mlb.com/glossary/statcast/barrel) defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let's visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics. ```{python} #| code-fold: true #| warning: true #| label: fig-batted-balls #| fig-cap: "Batted Balls by Contact Quality (2023-2025)" #| fig-alt: "Batted Balls by Contact Quality (2023-2025)" sns.scatterplot(data=df_bbe, x="launch_speed", y="launch_angle", hue="contact_quality", alpha=0.5) ``` @fig-batted-balls shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact. To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors. ```{python} #| code-fold: False #| warning: true # batting features only features = ["bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches", "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat", "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number"] dataset = df_bbe[features + ["barrel"]].copy() dataset["barrel"] = pd.factorize(dataset["barrel"])[0] dataset["stand"] = pd.factorize(dataset["stand"])[0] X = dataset[features] y = dataset["barrel"] X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42) # modeling pip = Pipeline([ ("imputer", SimpleImputer()), ("scaler", RobustScaler()), ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced")) ]) pip.fit(X_train, y_train) y_pred = pip.predict(X_val) score = pip.score(X_val, y_val) y_proba = pip.predict_proba(X_val) print(f"Validation score (Accuracy): {score:.4f}") print(f"Log Loss: {log_loss(y_val, y_proba):.4f}") kf = KFold(n_splits=5, shuffle=True, random_state=42) cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1) print(f"Cross-validation scores: {cv_scores}") print(f"Mean CV accuracy: {cv_scores.mean()}") ``` Just on batting features alone, we can achieve a validation accuracy of 66%. The average cross-validation accuracy is around 67%. While not perfect, this model provides a better-than-random ability to classify barrels based on batting metrics. Now it takes two to tango. We should also consider the pitcher's influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model's ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production. ```{python} #| code-fold: False #| warning: true # batting and pitching features only features = ["batter", "pitcher", "bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches", "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat", "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number", "release_speed", "release_pos_x", "release_pos_z", "p_throws", "zone", "vx0", "vy0", "vz0", "ax", "ay", "az", "release_pos_y", "pitch_type", "pitch_number", "age_pit", "api_break_z_with_gravity", "api_break_x_arm", "api_break_x_batter_in", "arm_angle", "zone", "effective_speed", "release_spin_rate", "release_extension"] dataset = df_bbe[features + ["barrel"]].copy() dataset["barrel"] = pd.factorize(dataset["barrel"])[0] categorical_features = ["stand", "p_throws", "pitch_type"] dataset_one_hot = pd.get_dummies(dataset, columns=categorical_features, drop_first=True) y = dataset_one_hot["barrel"] X = dataset_one_hot.drop("barrel", axis=1) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42) # %% modeling pip = Pipeline([ ("imputer", SimpleImputer()), ("scaler", RobustScaler()), ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced")) ]) pip.fit(X_train, y_train) y_pred = pip.predict(X_val) score = pip.score(X_val, y_val) y_proba = pip.predict_proba(X_val) print(f"Validation score (Accuracy): {score:.4f}") print(f"Log Loss: {log_loss(y_val, y_proba):.4f}") kf = KFold(n_splits=5, shuffle=True, random_state=42) cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1) print(f"Cross-validation scores: {cv_scores}") print(f"Mean CV accuracy: {cv_scores.mean()}") ``` With both batting and pitching features, the validation accuracy improves to 67% and the average cross-validation accuracy increases to around 68%. This is effectively a 1% improvement over batting features alone. There is some added value but not a huge leap. What would be interesting is if we could use the entire bat trajectory data to further improve classification, but that is a topic for another day. ### Model Evaluation I would be remiss if I did not include some model evaluation metrics. First, let's look at the confusion matrix to see how well our model is classifying barrels versus non-barrels. ```{python} #| code-fold: true #| warning: true #| label: fig-confusion-matrix #| fig-cap: "Confusion Matrix for Barrel Classification" #| fig-alt: "Confusion Matrix for Barrel Classification" y_pred = pip.predict(X_val) cm = confusion_matrix(y_val, y_pred) cm_display = ConfusionMatrixDisplay(cm).plot() plt.grid(False) ``` The majority of predictions are correct. There are a lot of false negatives. This indicates that the model is struggling to identify barrels when they do occur. Although there's a great deal of data imbalance (many more non-barrels than barrels). In such cases, accuracy alone can be misleading. Therefore, let's also look at the ROC curve to evaluate the model's performance across different classification thresholds. ```{python} #| code-fold: true #| warning: true #| label: fig-roc #| fig-cap: "ROC Curve for Barrel Classification" #| fig-alt: "ROC Curve for Barrel Classification" y_score = pip.decision_function(X_val) fpr, tpr, _ = roc_curve(y_val, y_score, pos_label=pip.classes_[1]) roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot() # area under the curve auc = roc_auc_score(y_val, y_score) plt.title(f"ROC Curve (AUC = {auc:.4f})"); ``` When observing the ROC curve, we see the model performing better than random guessing. The area under the curve (AUC) is a useful metric for summarizing the model's performance across all classification thresholds. An AUC of 0.75 indicates that the model has a good ability to distinguish between barrels and non-barrels across various thresholds. ## Who Should Find Barrels? Finally, let's look at which teams are generating the most barrels on average per game. This can provide insights into which teams are effectively leveraging barrels to enhance their offensive performance. ```{python} #| code-fold: true #| warning: true #| label: fig-avg-barrels-by-team #| fig-cap: "Average Barrels per Game by Team (2023-2025)" #| fig-alt: "Average Barrels per Game by Team (2023-2025)" from IPython.display import Markdown from tabulate import tabulate tbl = df.groupby("team_name")["barrels_total"].mean().sort_values(ascending=False) tbl_df = pd.DataFrame(tbl).reset_index() tbl_df.index += 1 tbl_df.columns = ["Team", "Avg Barrels per Game"] Markdown(tabulate(tbl_df.round(4), headers="keys")) ``` When looking at @fig-avg-barrels-by-team, it is apparent why [BaseballSavant](https://baseballsavant.mlb.com/leaderboard/statcast) lists barrels per plate appearance. Calculating average barrels per game is too granular where difference between teams is minimal. However, barrels per game still shows some trends among teams. For example, the Braves, Yankees, and Dodgers being in the top 3 checks out. Additionally, the Guardians, Nationals, and Reds in the bottome 3 also makes sense. The Guardians are an interesting case study. Despite boasting the lowest barrel rate in 2025, they still found themselves in the playoffs. So how did they do it? Being good at pitching always help. Exceptional fielding also plays a role. But one metric the Guardians excel at is maximize their Pull AIR %. In 2025, 21.6% of their batted balls were pulled and either a fly ball, line drive, or pop up. This was the second highest in MLB, behind the Chicago Cubs. This strikes me as interesting because a low barrel rate with a high Pull AIR % seems counterintuitive. However, it just goes to show that there are multiple ways to win in baseball. The Guardians found a way that worked for them. They are not sluggers but opportunists. ![Guardians Leverage-based Run Value](guardians-run-value.png) Take a look at the Guardians' leverage-based run value by batted ball type in 2025. All batters produced negative run values. But observe the "+/- Due To Leverage" column. For all but 2 players, the leverage adjustment was positive on swings. This means that the Guardians were able to get the most out of their batted balls in high-leverage situations, even if those batted balls were not barrels. By focusing on situational hitting and maximizing the impact of their contact, the Guardians were able to overcome their low barrel rate and still find success. ### Conclusion Barrels are the gospel of Statcast — a neat, data-driven way to explain why some swings change games and others don’t. Whether you roll your eyes at analytics or worship at its altar, barrels have become the sport’s shorthand for power. They correlate neatly with runs, wins, and the illusion of control. Teams build models around them, tuning every motion in the cage toward the perfect batted-ball outcome. But Cleveland seems to have missed that memo—or maybe they just ignored it. The Guardians aren’t chasing barrels; they’re chasing moments. They don’t overwhelm you with exit velocity. They outlast you with timing. While other teams obsess over maximizing run differentials, Cleveland’s figured out that you only need enough runs to win this game. Score when it counts, defend when it matters, and trust that efficiency can be its own kind of power. In an era defined by data, the Guardians are proof that there’s still room for something harder to measure: the knack for being good at the right time. :::{.callout-note} Past articles: - [Principal Component Analysis](https://runningonnumbers.com/posts/principal-component-analysis-python-baseball/) - [Support Vector Machine](https://runningonnumbers.com/posts/support-vector-machine/) - [K-Means Clustering](https://runningonnumbers.com/posts/k-means/) Github: - [Running on Numbers](https://github.com/oliverc1623/Running-On-Numbers-Public) ::: <script async data-uid="5d16db9e50" src="https://runningonnumbers.kit.com/5d16db9e50/index.js"></script>