In Order to Buy Runs, You Need to Buy Barrels

Barrels Are All you Need

People who run ball clubs, they think in terms of buying players. Your goal shouldn’t be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011)

In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team’s offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs.

Code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
sns.set_theme(style="whitegrid", palette="deep")
pd.set_option('display.max_columns', None)

df = pd.read_csv("team-games-2023-2025-with-runs.csv")
# convert "Oakland Athletics" to "Athletics"
df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics")
sns.barplot(data=df, x="barrels_total", y="runs_scored")
plt.xlabel("Total Barrels in Game")
plt.ylabel("Runs Scored in Game")
plt.show()

Figure 1: Runs Scored vs Total Barrels in Game (2023-2025)

To visualize their impact, let’s look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe Figure 1. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team’s offensive output and ultimately winning games.

In addition to runs scored, barrels indicate overall offensive performance. In Figure 2, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There’s probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact.

Code

df_barrels = pd.read_csv("barrels-2023-2025.csv")
df_barrels["barrel"] = 1
df_barrels["contact_quality"] = "barrel"

df_weak = pd.read_csv("poor-contact-2023-2025.csv")
df_weak["barrel"] = 0
df_weak["contact_quality"] = "poor"

df_solid = pd.read_csv("solid-contact-2023-2025.csv")
df_solid["barrel"] = 0
df_solid["contact_quality"] = "solid"

df_bbe = pd.concat([df_barrels, df_weak, df_solid], ignore_index=True)
sns.kdeplot(df_bbe, x="estimated_woba_using_speedangle", hue="contact_quality", fill=True, common_norm=False, alpha=0.5)
# rename legend
plt.legend(title="Contact Quality", labels=["Solid Contact", "Poor Contact", "Barrel"])

Figure 2: xwOBA by Contact Quality (2023-2025)

So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly.

How to Find Barrels

Statcast defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let’s visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics.

Code

sns.scatterplot(data=df_bbe, x="launch_speed", y="launch_angle", hue="contact_quality", alpha=0.5)

Figure 3: Batted Balls by Contact Quality (2023-2025)

Figure 3 shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact.

To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors.

Code

from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer, SimpleImputer  
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, log_loss
from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.tree import plot_tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import PolynomialFeatures, RobustScaler, StandardScaler
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
from sklearn.metrics import roc_curve, roc_auc_score

# batting features only
features = ["bat_speed", "attack_angle", "swing_length", "attack_direction", 
            "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches",
            "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat",
            "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number"]

dataset = df_bbe[features + ["barrel"]].copy()
dataset["barrel"] = pd.factorize(dataset["barrel"])[0]
dataset["stand"] = pd.factorize(dataset["stand"])[0]
X = dataset[features]
y = dataset["barrel"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)

# %% modeling
pip = Pipeline([
    ('imputer', IterativeImputer(random_state=42)),
    ('scaler', RobustScaler()),
    ('classifier', HistGradientBoostingClassifier(random_state=42))
])
pip.fit(X_train, y_train)
y_pred = pip.predict(X_val)
score = pip.score(X_val, y_val)
y_proba = pip.predict_proba(X_val)
print(f"Validation score (Accuracy): {score:.4f}")
print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")

# kf = KFold(n_splits=5, shuffle=True, random_state=42)
# cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='accuracy', n_jobs=-1)
# print(f"Cross-validation scores: {cv_scores}")
# print(f"Mean CV accuracy: {cv_scores.mean()}")

(58962, 14) (58962,) (14741, 14) (14741,)
Validation score (Accuracy): 0.6909
Log Loss: 0.5611

Now it takes two to tango. We should also consider the pitcher’s influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model’s ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production.

# batting and pitching features only
features = ["batter", "pitcher", "bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches",
            "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat",
            "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number", 
            "release_speed", "release_pos_x", "release_pos_z", "p_throws", "zone", "vx0", "vy0", "vz0", "ax", "ay", "az", "release_pos_y", "pitch_type", "pitch_number", "age_pit", "api_break_z_with_gravity", "api_break_x_arm", "api_break_x_batter_in",
            "arm_angle", "zone", "effective_speed", "release_spin_rate", "release_extension"]

dataset = df_bbe[features + ["barrel"]].copy()
dataset["barrel"] = pd.factorize(dataset["barrel"])[0]

categorical_features = ["stand", "p_throws", "pitch_type"]
dataset_one_hot = pd.get_dummies(dataset, columns=categorical_features, drop_first=True)

y = dataset_one_hot["barrel"]
X = dataset_one_hot.drop("barrel", axis=1)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)

# %% modeling
pip = Pipeline([
    ('imputer', SimpleImputer()),
    ('scaler', RobustScaler()),
    ('classifier', HistGradientBoostingClassifier(random_state=42))
])
pip.fit(X_train, y_train)
y_pred = pip.predict(X_val)
score = pip.score(X_val, y_val)
y_proba = pip.predict_proba(X_val)
print(f"Validation score (Accuracy): {score:.4f}")
print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")

kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='accuracy', n_jobs=-1)
print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV accuracy: {cv_scores.mean()}")

(58962, 53) (58962,) (14741, 53) (14741,)
Validation score (Accuracy): 0.7027
Log Loss: 0.5449
Cross-validation scores: [0.70448571 0.70457051 0.7057327  0.69911805 0.7058175 ]
Mean CV accuracy: 0.7039448938904378

from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

y_pred = pip.predict(X_val)
cm = confusion_matrix(y_val, y_pred)

cm_display = ConfusionMatrixDisplay(cm).plot()
plt.grid(False)

from sklearn.metrics import RocCurveDisplay, roc_curve

y_score = pip.decision_function(X_val)
plt.figure(figsize=(8, 6))
fpr, tpr, _ = roc_curve(y_val, y_score, pos_label=pip.classes_[1])
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()

<Figure size 768x576 with 0 Axes>

Who Should Find Barrels?

Code

from IPython.display import Markdown
from tabulate import tabulate

tbl = df.groupby("team_name")["barrels_total"].mean().sort_values(ascending=False)
tbl_df = pd.DataFrame(tbl).reset_index()
tbl_df.index += 1
tbl_df.columns = ["Team", "Avg Barrels per Game"]
Markdown(tabulate(tbl_df.round(4), headers="keys"))

	Team	Avg Barrels per Game
1	Atlanta Braves	2.5783
2	New York Yankees	2.5678
3	Los Angeles Dodgers	2.3291
4	New York Mets	2.2746
5	Seattle Mariners	2.2129
6	Minnesota Twins	2.1535
7	Chicago Cubs	2.125
8	Philadelphia Phillies	2.1232
9	Baltimore Orioles	2.1185
10	Boston Red Sox	2.1
11	Texas Rangers	2.0541
12	Los Angeles Angels	2.0333
13	Houston Astros	2.0273
14	Toronto Blue Jays	2.0083
15	Detroit Tigers	1.9896
16	Kansas City Royals	1.9771
17	San Francisco Giants	1.9353
18	San Diego Padres	1.931
19	St. Louis Cardinals	1.929
20	Arizona Diamondbacks	1.925
21	Athletics	1.9208
22	Tampa Bay Rays	1.8812
23	Miami Marlins	1.8625
24	Colorado Rockies	1.8264
25	Pittsburgh Pirates	1.815
26	Milwaukee Brewers	1.7635
27	Chicago White Sox	1.7484
28	Cincinnati Reds	1.6674
29	Washington Nationals	1.6286
30	Cleveland Guardians	1.4635

Figure 4: Average Barrels per Game by Team (2023-2025)

Note

Past articles: - Principal Component Analysis - Support Vector Machine - K-Means Clustering Github: - Running on Numbers

--- title: "In Order to Buy Runs, You Need to Buy Barrels" author: "Oliver Chang" email: oliverc1622@gmail.com date: 2025-10-19 # Update this date when you make changes categories: [MLB, graph theory, data visualization] toc: true format: html: html-math-method: katex code-tools: true image: "main.png" bibliography: references.bib title-block-banner: default --- ![](peter-brand.png) ## Barrels Are All you Need > People who run ball clubs, they think in terms of buying players. Your goal shouldn't be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011) In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team's offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs. ```{python} #| code-fold: true #| warning: true #| label: fig-runs-vs-barrels #| fig-cap: "Runs Scored vs Total Barrels in Game (2023-2025)" #| fig-alt: "Runs Scored vs Total Barrels in Game (2023-2025)" import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np import warnings warnings.filterwarnings('ignore') sns.set_theme(style="whitegrid", palette="deep") pd.set_option('display.max_columns', None) df = pd.read_csv("team-games-2023-2025-with-runs.csv") # convert "Oakland Athletics" to "Athletics" df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics") sns.barplot(data=df, x="barrels_total", y="runs_scored") plt.xlabel("Total Barrels in Game") plt.ylabel("Runs Scored in Game") plt.show() ``` To visualize their impact, let's look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe @fig-runs-vs-barrels. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team's offensive output and ultimately winning games. In addition to runs scored, barrels indicate overall offensive performance. In @fig-xwoba-by-contact-quality, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There's probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact. ```{python} #| code-fold: true #| warning: true #| label: fig-xwoba-by-contact-quality #| fig-cap: "xwOBA by Contact Quality (2023-2025)" #| fig-alt: "xwOBA by Contact Quality (2023-2025)" df_barrels = pd.read_csv("barrels-2023-2025.csv") df_barrels["barrel"] = 1 df_barrels["contact_quality"] = "barrel" df_weak = pd.read_csv("poor-contact-2023-2025.csv") df_weak["barrel"] = 0 df_weak["contact_quality"] = "poor" df_solid = pd.read_csv("solid-contact-2023-2025.csv") df_solid["barrel"] = 0 df_solid["contact_quality"] = "solid" df_bbe = pd.concat([df_barrels, df_weak, df_solid], ignore_index=True) sns.kdeplot(df_bbe, x="estimated_woba_using_speedangle", hue="contact_quality", fill=True, common_norm=False, alpha=0.5) # rename legend plt.legend(title="Contact Quality", labels=["Solid Contact", "Poor Contact", "Barrel"]) ``` So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly. ## How to Find Barrels [Statcast](https://www.mlb.com/glossary/statcast/barrel) defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let's visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics. ```{python} #| code-fold: true #| warning: true #| label: fig-batted-balls #| fig-cap: "Batted Balls by Contact Quality (2023-2025)" #| fig-alt: "Batted Balls by Contact Quality (2023-2025)" sns.scatterplot(data=df_bbe, x="launch_speed", y="launch_angle", hue="contact_quality", alpha=0.5) ``` @fig-batted-balls shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact. To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors. ```{python} #| code-fold: true #| warning: true from sklearn.experimental import enable_iterative_imputer # noqa from sklearn.impute import IterativeImputer, SimpleImputer from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier from sklearn.svm import SVC from sklearn.metrics import confusion_matrix, log_loss from sklearn.model_selection import KFold, cross_val_score from sklearn.pipeline import Pipeline from sklearn.tree import plot_tree from sklearn.tree import DecisionTreeClassifier from sklearn.preprocessing import PolynomialFeatures, RobustScaler, StandardScaler from sklearn.model_selection import RandomizedSearchCV from scipy.stats import randint from sklearn.metrics import roc_curve, roc_auc_score # batting features only features = ["bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches", "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat", "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number"] dataset = df_bbe[features + ["barrel"]].copy() dataset["barrel"] = pd.factorize(dataset["barrel"])[0] dataset["stand"] = pd.factorize(dataset["stand"])[0] X = dataset[features] y = dataset["barrel"] X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42) print(X_train.shape, y_train.shape, X_val.shape, y_val.shape) # %% modeling pip = Pipeline([ ("imputer", IterativeImputer(random_state=42)), ("scaler", RobustScaler()), ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced")) ]) pip.fit(X_train, y_train) y_pred = pip.predict(X_val) score = pip.score(X_val, y_val) y_proba = pip.predict_proba(X_val) print(f"Validation score (Accuracy): {score:.4f}") print(f"Log Loss: {log_loss(y_val, y_proba):.4f}") kf = KFold(n_splits=5, shuffle=True, random_state=42) cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1) print(f"Cross-validation scores: {cv_scores}") print(f"Mean CV accuracy: {cv_scores.mean()}") ``` Now it takes two to tango. We should also consider the pitcher's influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model's ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production. ```{python} # batting and pitching features only features = ["batter", "pitcher", "bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches", "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat", "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number", "release_speed", "release_pos_x", "release_pos_z", "p_throws", "zone", "vx0", "vy0", "vz0", "ax", "ay", "az", "release_pos_y", "pitch_type", "pitch_number", "age_pit", "api_break_z_with_gravity", "api_break_x_arm", "api_break_x_batter_in", "arm_angle", "zone", "effective_speed", "release_spin_rate", "release_extension"] dataset = df_bbe[features + ["barrel"]].copy() dataset["barrel"] = pd.factorize(dataset["barrel"])[0] categorical_features = ["stand", "p_throws", "pitch_type"] dataset_one_hot = pd.get_dummies(dataset, columns=categorical_features, drop_first=True) y = dataset_one_hot["barrel"] X = dataset_one_hot.drop("barrel", axis=1) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42) print(X_train.shape, y_train.shape, X_val.shape, y_val.shape) # %% modeling pip = Pipeline([ ("imputer", SimpleImputer()), ("scaler", RobustScaler()), ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced")) ]) pip.fit(X_train, y_train) y_pred = pip.predict(X_val) score = pip.score(X_val, y_val) y_proba = pip.predict_proba(X_val) print(f"Validation score (Accuracy): {score:.4f}") print(f"Log Loss: {log_loss(y_val, y_proba):.4f}") kf = KFold(n_splits=5, shuffle=True, random_state=42) cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1) print(f"Cross-validation scores: {cv_scores}") print(f"Mean CV accuracy: {cv_scores.mean()}") ``` ### Model Evaluation ```{python} from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix y_pred = pip.predict(X_val) cm = confusion_matrix(y_val, y_pred) cm_display = ConfusionMatrixDisplay(cm).plot() plt.grid(False) ``` ```{python} from sklearn.metrics import RocCurveDisplay, roc_curve y_score = pip.decision_function(X_val) plt.figure(figsize=(8, 6)) fpr, tpr, _ = roc_curve(y_val, y_score, pos_label=pip.classes_[1]) roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot() ``` ```{python} from sklearn.inspection import DecisionBoundaryDisplay DecisionBoundaryDisplay.from_estimator( pip, X_val.iloc[:, :2], # Use the first two features for visualization response_method="predict", alpha=0.5, ) ``` ## Who Should Find Barrels? ```{python} #| code-fold: true #| warning: true #| label: fig-avg-barrels-by-team #| fig-cap: "Average Barrels per Game by Team (2023-2025)" #| fig-alt: "Average Barrels per Game by Team (2023-2025)" from IPython.display import Markdown from tabulate import tabulate tbl = df.groupby("team_name")["barrels_total"].mean().sort_values(ascending=False) tbl_df = pd.DataFrame(tbl).reset_index() tbl_df.index += 1 tbl_df.columns = ["Team", "Avg Barrels per Game"] Markdown(tabulate(tbl_df.round(4), headers="keys")) ``` :::{.callout-note} Past articles: - [Principal Component Analysis](https://runningonnumbers.com/posts/principal-component-analysis-python-baseball/) - [Support Vector Machine](https://runningonnumbers.com/posts/support-vector-machine/) - [K-Means Clustering](https://runningonnumbers.com/posts/k-means/) Github: - [Running on Numbers](https://github.com/oliverc1623/Running-On-Numbers-Public) ::: <script async data-uid="5d16db9e50" src="https://runningonnumbers.kit.com/5d16db9e50/index.js"></script>