In Order to Buy Runs, You Need to Buy Barrels

MLB
graph theory
data visualization
Author

Oliver Chang

Published

October 19, 2025

Barrels Are All you Need

People who run ball clubs, they think in terms of buying players. Your goal shouldn’t be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011)

In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team’s offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs.

Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import warnings
warnings.filterwarnings('ignore')
sns.set_theme(style="whitegrid", palette="deep")
pd.set_option('display.max_columns', None)

df = pd.read_csv("team-games-2023-2025-with-runs.csv")
# convert "Oakland Athletics" to "Athletics"
df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics")
sns.barplot(data=df, x="barrels_total", y="runs_scored")
plt.xlabel("Total Barrels in Game")
plt.ylabel("Runs Scored in Game")
plt.show()
Runs Scored vs Total Barrels in Game (2023-2025)
Figure 1: Runs Scored vs Total Barrels in Game (2023-2025)

To visualize their impact, let’s look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe Figure 1. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team’s offensive output and ultimately winning games.

In addition to runs scored, barrels indicate overall offensive performance. In Figure 2, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There’s probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact.

Code
df_barrels = pd.read_csv("barrels-2023-2025.csv")
df_barrels["barrel"] = 1
df_barrels["contact_quality"] = "barrel"

df_weak = pd.read_csv("poor-contact-2023-2025.csv")
df_weak["barrel"] = 0
df_weak["contact_quality"] = "poor"

df_solid = pd.read_csv("solid-contact-2023-2025.csv")
df_solid["barrel"] = 0
df_solid["contact_quality"] = "solid"

df_bbe = pd.concat([df_barrels, df_weak, df_solid], ignore_index=True)
sns.kdeplot(df_bbe, x="estimated_woba_using_speedangle", hue="contact_quality", fill=True, common_norm=False, alpha=0.5)
# rename legend
plt.legend(title="Contact Quality", labels=["Solid Contact", "Poor Contact", "Barrel"])
xwOBA by Contact Quality (2023-2025)
Figure 2: xwOBA by Contact Quality (2023-2025)

So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly.

How to Find Barrels

Statcast defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let’s visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics.

Code
sns.scatterplot(data=df_bbe, x="launch_speed", y="launch_angle", hue="contact_quality", alpha=0.5)
Batted Balls by Contact Quality (2023-2025)
Figure 3: Batted Balls by Contact Quality (2023-2025)

Figure 3 shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact.

To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors.

Code
from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer, SimpleImputer  
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, log_loss
from sklearn.model_selection import KFold, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.tree import plot_tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import PolynomialFeatures, RobustScaler, StandardScaler
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
from sklearn.metrics import roc_curve, roc_auc_score

# batting features only
features = ["bat_speed", "attack_angle", "swing_length", "attack_direction", 
            "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches",
            "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat",
            "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number"]

dataset = df_bbe[features + ["barrel"]].copy()
dataset["barrel"] = pd.factorize(dataset["barrel"])[0]
dataset["stand"] = pd.factorize(dataset["stand"])[0]
X = dataset[features]
y = dataset["barrel"]
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)

# %% modeling
pip = Pipeline([
    ('imputer', IterativeImputer(random_state=42)),
    ('scaler', RobustScaler()),
    ('classifier', HistGradientBoostingClassifier(random_state=42))
])
pip.fit(X_train, y_train)
y_pred = pip.predict(X_val)
score = pip.score(X_val, y_val)
y_proba = pip.predict_proba(X_val)
print(f"Validation score (Accuracy): {score:.4f}")
print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")

# kf = KFold(n_splits=5, shuffle=True, random_state=42)
# cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='accuracy', n_jobs=-1)
# print(f"Cross-validation scores: {cv_scores}")
# print(f"Mean CV accuracy: {cv_scores.mean()}")
(58962, 14) (58962,) (14741, 14) (14741,)
Validation score (Accuracy): 0.6909
Log Loss: 0.5611

Now it takes two to tango. We should also consider the pitcher’s influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model’s ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production.

# batting and pitching features only
features = ["batter", "pitcher", "bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches",
            "intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat",
            "n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number", 
            "release_speed", "release_pos_x", "release_pos_z", "p_throws", "zone", "vx0", "vy0", "vz0", "ax", "ay", "az", "release_pos_y", "pitch_type", "pitch_number", "age_pit", "api_break_z_with_gravity", "api_break_x_arm", "api_break_x_batter_in",
            "arm_angle", "zone", "effective_speed", "release_spin_rate", "release_extension"]

dataset = df_bbe[features + ["barrel"]].copy()
dataset["barrel"] = pd.factorize(dataset["barrel"])[0]

categorical_features = ["stand", "p_throws", "pitch_type"]
dataset_one_hot = pd.get_dummies(dataset, columns=categorical_features, drop_first=True)

y = dataset_one_hot["barrel"]
X = dataset_one_hot.drop("barrel", axis=1)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)

# %% modeling
pip = Pipeline([
    ('imputer', SimpleImputer()),
    ('scaler', RobustScaler()),
    ('classifier', HistGradientBoostingClassifier(random_state=42))
])
pip.fit(X_train, y_train)
y_pred = pip.predict(X_val)
score = pip.score(X_val, y_val)
y_proba = pip.predict_proba(X_val)
print(f"Validation score (Accuracy): {score:.4f}")
print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")

kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='accuracy', n_jobs=-1)
print(f"Cross-validation scores: {cv_scores}")
print(f"Mean CV accuracy: {cv_scores.mean()}")
(58962, 53) (58962,) (14741, 53) (14741,)
Validation score (Accuracy): 0.7027
Log Loss: 0.5449
Cross-validation scores: [0.70448571 0.70457051 0.7057327  0.69911805 0.7058175 ]
Mean CV accuracy: 0.7039448938904378
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix

y_pred = pip.predict(X_val)
cm = confusion_matrix(y_val, y_pred)

cm_display = ConfusionMatrixDisplay(cm).plot()
plt.grid(False)

from sklearn.metrics import RocCurveDisplay, roc_curve

y_score = pip.decision_function(X_val)
plt.figure(figsize=(8, 6))
fpr, tpr, _ = roc_curve(y_val, y_score, pos_label=pip.classes_[1])
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()
<Figure size 768x576 with 0 Axes>

Who Should Find Barrels?

Code
from IPython.display import Markdown
from tabulate import tabulate

tbl = df.groupby("team_name")["barrels_total"].mean().sort_values(ascending=False)
tbl_df = pd.DataFrame(tbl).reset_index()
tbl_df.index += 1
tbl_df.columns = ["Team", "Avg Barrels per Game"]
Markdown(tabulate(tbl_df.round(4), headers="keys"))
Team Avg Barrels per Game
1 Atlanta Braves 2.5783
2 New York Yankees 2.5678
3 Los Angeles Dodgers 2.3291
4 New York Mets 2.2746
5 Seattle Mariners 2.2129
6 Minnesota Twins 2.1535
7 Chicago Cubs 2.125
8 Philadelphia Phillies 2.1232
9 Baltimore Orioles 2.1185
10 Boston Red Sox 2.1
11 Texas Rangers 2.0541
12 Los Angeles Angels 2.0333
13 Houston Astros 2.0273
14 Toronto Blue Jays 2.0083
15 Detroit Tigers 1.9896
16 Kansas City Royals 1.9771
17 San Francisco Giants 1.9353
18 San Diego Padres 1.931
19 St. Louis Cardinals 1.929
20 Arizona Diamondbacks 1.925
21 Athletics 1.9208
22 Tampa Bay Rays 1.8812
23 Miami Marlins 1.8625
24 Colorado Rockies 1.8264
25 Pittsburgh Pirates 1.815
26 Milwaukee Brewers 1.7635
27 Chicago White Sox 1.7484
28 Cincinnati Reds 1.6674
29 Washington Nationals 1.6286
30 Cleveland Guardians 1.4635
Figure 4: Average Barrels per Game by Team (2023-2025)