People who run ball clubs, they think in terms of buying players. Your goal shouldn’t be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011)
In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team’s offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs.
Code
import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltimport numpy as npimport warningswarnings.filterwarnings('ignore')sns.set_theme(style="whitegrid", palette="deep")pd.set_option('display.max_columns', None)df = pd.read_csv("team-games-2023-2025-with-runs.csv")# convert "Oakland Athletics" to "Athletics"df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics")sns.barplot(data=df, x="barrels_total", y="runs_scored")plt.xlabel("Total Barrels in Game")plt.ylabel("Runs Scored in Game")plt.show()
Figure 1: Runs Scored vs Total Barrels in Game (2023-2025)
To visualize their impact, let’s look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe Figure 1. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team’s offensive output and ultimately winning games.
In addition to runs scored, barrels indicate overall offensive performance. In Figure 2, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There’s probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact.
So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly.
How to Find Barrels
Statcast defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let’s visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics.
Figure 3: Batted Balls by Contact Quality (2023-2025)
Figure 3 shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact.
To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors.
Now it takes two to tango. We should also consider the pitcher’s influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model’s ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production.
---title: "In Order to Buy Runs, You Need to Buy Barrels"author: "Oliver Chang"email: oliverc1622@gmail.comdate: 2025-10-19 # Update this date when you make changescategories: [MLB, graph theory, data visualization]toc: trueformat: html: html-math-method: katex code-tools: trueimage: "main.png"bibliography: references.bibtitle-block-banner: default---## Barrels Are All you Need> People who run ball clubs, they think in terms of buying players. Your goal shouldn't be to buy players, your goal should be to buy wins. And in order to buy wins, you need to buy runs. - Peter Brand (Jonah Hill portraying Paul DePodesta), Moneyball (2011)In order to buy runs, you need to buy barrels. Barrels are a highly sought after batted ball event in baseball, contributing significantly to a team's offensive success. A barrel is defined as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. Barrels are known for their high likelihood of resulting in extra-base hits, including home runs.```{python}#| code-fold: true#| warning: true#| label: fig-runs-vs-barrels#| fig-cap: "Runs Scored vs Total Barrels in Game (2023-2025)"#| fig-alt: "Runs Scored vs Total Barrels in Game (2023-2025)"import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltimport numpy as npimport warningswarnings.filterwarnings('ignore')sns.set_theme(style="whitegrid", palette="deep")pd.set_option('display.max_columns', None)df = pd.read_csv("team-games-2023-2025-with-runs.csv")# convert "Oakland Athletics" to "Athletics"df["team_name"] = df["team_name"].str.replace("Oakland Athletics", "Athletics")sns.barplot(data=df, x="barrels_total", y="runs_scored")plt.xlabel("Total Barrels in Game")plt.ylabel("Runs Scored in Game")plt.show()```To visualize their impact, let's look at the relationship between the total number of barrels in a game and the runs scored in that game from the 2023 to 2025 MLB seasons. Observe @fig-runs-vs-barrels. In games where teams hit more barrels, they tend to score more runs. At 0 barrels teams average around 2 runs, while at 2 barrels, the average runs scored almost doubles to around 4 runs. This trend continues, with teams scoring even more runs as the number of barrels increases. This visualization underscores the importance of barrels in contributing to a team's offensive output and ultimately winning games.In addition to runs scored, barrels indicate overall offensive performance. In @fig-xwoba-by-contact-quality, we visualize the expected weighted on-base average (xwOBA) based on contact quality: barrels, solid contact, and poor contact. Barrels (in blue) have significantly higher xwOBA values compared to solid contact (in green) and poor contact (in orange). There's probably some overlap between the three groups because xwOBA includes walks and strikouts, which are not directly related to batted ball quality. However, the distinction is still clear: barrels lead to much better offensive outcomes than other types of contact.```{python}#| code-fold: true#| warning: true#| label: fig-xwoba-by-contact-quality#| fig-cap: "xwOBA by Contact Quality (2023-2025)"#| fig-alt: "xwOBA by Contact Quality (2023-2025)"df_barrels = pd.read_csv("barrels-2023-2025.csv")df_barrels["barrel"] =1df_barrels["contact_quality"] ="barrel"df_weak = pd.read_csv("poor-contact-2023-2025.csv")df_weak["barrel"] =0df_weak["contact_quality"] ="poor"df_solid = pd.read_csv("solid-contact-2023-2025.csv")df_solid["barrel"] =0df_solid["contact_quality"] ="solid"df_bbe = pd.concat([df_barrels, df_weak, df_solid], ignore_index=True)sns.kdeplot(df_bbe, x="estimated_woba_using_speedangle", hue="contact_quality", fill=True, common_norm=False, alpha=0.5)# rename legendplt.legend(title="Contact Quality", labels=["Solid Contact", "Poor Contact", "Barrel"])```So this begs the question: how can teams increase their barrel counts? One approach is to analyze the factors that contribute to successful barrel outcomes. By leveraging machine learning techniques, teams can identify key features that influence barrel production and optimize their strategies accordingly.## How to Find Barrels[Statcast](https://www.mlb.com/glossary/statcast/barrel) defines a barrel as a batted ball with an exit velocity of at least 98 mph and a launch angle between 26 and 30 degrees, or with higher exit velocities for slightly different launch angles. If we want to find barrels, we first need to understand what differentiates them from other types of batted balls. Let's visualize batted balls based on their launch speed and launch angle, categorized by contact quality: barrels, solid contact, and poor contact. This will help us see how barrels stand out in terms of these two key metrics.```{python}#| code-fold: true#| warning: true#| label: fig-batted-balls#| fig-cap: "Batted Balls by Contact Quality (2023-2025)"#| fig-alt: "Batted Balls by Contact Quality (2023-2025)"sns.scatterplot(data=df_bbe, x="launch_speed", y="launch_angle", hue="contact_quality", alpha=0.5)```@fig-batted-balls shows a scatter plot of batted balls categorized by contact quality. Barrels (in blue) are clustered in a specific region characterized by high launch speeds and optimal launch angles, while solid contact and poor contact batted balls are more dispersed across the plot. This visualization highlights the distinct characteristics of barrels compared to other types of contact.To determine barrels, we can use machine learning classification techniques. By training a model on features such as bat speed, attack angle, swing length, and other relevant metrics, we can predict whether a batted ball will be a barrel or not. While outcome statistics like wOBA are useful, they are not directly used in the classification model since they are results of the batted ball rather than predictors.```{python}#| code-fold: true#| warning: truefrom sklearn.experimental import enable_iterative_imputer # noqafrom sklearn.impute import IterativeImputer, SimpleImputer from sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifier, HistGradientBoostingClassifierfrom sklearn.svm import SVCfrom sklearn.metrics import confusion_matrix, log_lossfrom sklearn.model_selection import KFold, cross_val_scorefrom sklearn.pipeline import Pipelinefrom sklearn.tree import plot_treefrom sklearn.tree import DecisionTreeClassifierfrom sklearn.preprocessing import PolynomialFeatures, RobustScaler, StandardScalerfrom sklearn.model_selection import RandomizedSearchCVfrom scipy.stats import randintfrom sklearn.metrics import roc_curve, roc_auc_score# batting features onlyfeatures = ["bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches","intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat","n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number"]dataset = df_bbe[features + ["barrel"]].copy()dataset["barrel"] = pd.factorize(dataset["barrel"])[0]dataset["stand"] = pd.factorize(dataset["stand"])[0]X = dataset[features]y = dataset["barrel"]X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)# %% modelingpip = Pipeline([ ("imputer", IterativeImputer(random_state=42)), ("scaler", RobustScaler()), ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced"))])pip.fit(X_train, y_train)y_pred = pip.predict(X_val)score = pip.score(X_val, y_val)y_proba = pip.predict_proba(X_val)print(f"Validation score (Accuracy): {score:.4f}")print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")kf = KFold(n_splits=5, shuffle=True, random_state=42)cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1)print(f"Cross-validation scores: {cv_scores}")print(f"Mean CV accuracy: {cv_scores.mean()}")```Now it takes two to tango. We should also consider the pitcher's influence on barrel outcomes. By incorporating pitching features such as pitch type, velocity, spin rate, and movement, we can enhance our model's ability to predict barrels. Combining both batting and pitching features provides a more comprehensive view of the factors that contribute to barrel production.```{python}# batting and pitching features onlyfeatures = ["batter", "pitcher", "bat_speed", "attack_angle", "swing_length", "attack_direction", "swing_path_tilt", "intercept_ball_minus_batter_pos_x_inches","intercept_ball_minus_batter_pos_y_inches", "stand", "age_bat","n_thruorder_pitcher", "inning", "balls", "strikes", "pitch_number", "release_speed", "release_pos_x", "release_pos_z", "p_throws", "zone", "vx0", "vy0", "vz0", "ax", "ay", "az", "release_pos_y", "pitch_type", "pitch_number", "age_pit", "api_break_z_with_gravity", "api_break_x_arm", "api_break_x_batter_in","arm_angle", "zone", "effective_speed", "release_spin_rate", "release_extension"]dataset = df_bbe[features + ["barrel"]].copy()dataset["barrel"] = pd.factorize(dataset["barrel"])[0]categorical_features = ["stand", "p_throws", "pitch_type"]dataset_one_hot = pd.get_dummies(dataset, columns=categorical_features, drop_first=True)y = dataset_one_hot["barrel"]X = dataset_one_hot.drop("barrel", axis=1)X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)print(X_train.shape, y_train.shape, X_val.shape, y_val.shape)# %% modelingpip = Pipeline([ ("imputer", SimpleImputer()), ("scaler", RobustScaler()), ("classifier", HistGradientBoostingClassifier(random_state=42, class_weight="balanced"))])pip.fit(X_train, y_train)y_pred = pip.predict(X_val)score = pip.score(X_val, y_val)y_proba = pip.predict_proba(X_val)print(f"Validation score (Accuracy): {score:.4f}")print(f"Log Loss: {log_loss(y_val, y_proba):.4f}")kf = KFold(n_splits=5, shuffle=True, random_state=42)cv_scores = cross_val_score(pip, X_train, y_train, cv=kf, scoring='balanced_accuracy', n_jobs=-1)print(f"Cross-validation scores: {cv_scores}")print(f"Mean CV accuracy: {cv_scores.mean()}")```### Model Evaluation```{python}from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrixy_pred = pip.predict(X_val)cm = confusion_matrix(y_val, y_pred)cm_display = ConfusionMatrixDisplay(cm).plot()plt.grid(False)``````{python}from sklearn.metrics import RocCurveDisplay, roc_curvey_score = pip.decision_function(X_val)plt.figure(figsize=(8, 6))fpr, tpr, _ = roc_curve(y_val, y_score, pos_label=pip.classes_[1])roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()``````{python}from sklearn.inspection import DecisionBoundaryDisplayDecisionBoundaryDisplay.from_estimator( pip, X_val.iloc[:, :2], # Use the first two features for visualization response_method="predict", alpha=0.5,)```## Who Should Find Barrels?```{python}#| code-fold: true#| warning: true#| label: fig-avg-barrels-by-team#| fig-cap: "Average Barrels per Game by Team (2023-2025)"#| fig-alt: "Average Barrels per Game by Team (2023-2025)"from IPython.display import Markdownfrom tabulate import tabulatetbl = df.groupby("team_name")["barrels_total"].mean().sort_values(ascending=False)tbl_df = pd.DataFrame(tbl).reset_index()tbl_df.index +=1tbl_df.columns = ["Team", "Avg Barrels per Game"]Markdown(tabulate(tbl_df.round(4), headers="keys"))```:::{.callout-note}Past articles: - [Principal Component Analysis](https://runningonnumbers.com/posts/principal-component-analysis-python-baseball/)- [Support Vector Machine](https://runningonnumbers.com/posts/support-vector-machine/)- [K-Means Clustering](https://runningonnumbers.com/posts/k-means/)Github:- [Running on Numbers](https://github.com/oliverc1623/Running-On-Numbers-Public):::<script async data-uid="5d16db9e50" src="https://runningonnumbers.kit.com/5d16db9e50/index.js"></script>