Visualizing MLB Realignment with Graph Theory

Introduction

In this article, we chime in on the current discourse surrounding MLB expansion and realignment. Recently, Rob Manfred mentioned the possibility of adding two more teams to the league. Across social media, fans have expressed their opinions on the matter. I want to explore how graph theory can help visualize the current MLB schedule and how it might change with expansion and realignment.

In college, I majored in math and computer science. While my emphasis was on statistics and machine learning, I always had a keen interest in number theory, graph theory, and combinatorics; in fact, my senior thesis was on fast matrix multiplication. One course I missed out on during my undergrad was a graph theory course that was offered at Harvey Mudd College. Given that this blog serves as a platform for me to explore my interests, I thought it would be fun to apply graph theory concepts to MLB scheduling.

Graph Theory 101

Let’s introduce some graph theory formality. We’ll be taking note from Wikipedia’s definition. Let a graph be defined as G=(V,E) where V is a set of vertices and E is a set of edges. Vertices can also be referred to as nodes, and edges can be thought of as connections or links between these nodes. E \subseteq \{ (u,v) | u,v \in V \text{ and } u \neq v \}. This is a simple, undirected graph without loops or multiple edges.

A directed graph is a graph in which edges have orientations, indicating a one-way relationship between vertices. Edges can also have weights, representing the strength or capacity of the connection. A directed graph has the same annotation as an undirected graph, but the edges are ordered pairs. Hence, E \subseteq \{ (u,v) | u,v \in V^2 \text{ and } u \neq v \}. V^2 is represents the cartisian product (V \times V). This is the set of all possible ordered pairs of vertices. An edge in this graph, for example, could be (u,v) but not (v,u). To further illustrate this point, consider four verticies (V = \{A, B, B\}). The set of edges in the undirected graph would be \{(A,B), (A,C), (A,D), (B,C), (B,D), (C,D)\}, while the set of edges in the directed graph would be \{(A,B), (B,A), (A,C), \\ (C,A), (A,D), (D,A), \\ (B,C), (C,B), (B,D), \\ (D,B), (C,D), (D,C)\}.

Breadth First Search (BFS)

BFS is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a ‘search key’) and explores the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level. There are ample resources online to learn about BFS. I want to highlight that BFS uses a Queue data structure to keep track of the nodes to be explored next. There’s a quick pseudo code implementation:

from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])
    while queue:
        vertex = queue.popleft()
        if vertex not in visited:
            print("Visiting:", vertex)
            visited.add(vertex)
            for neighbor in graph[vertex]:
                if neighbor not in visited:
                    queue.append(neighbor)

The runtime complexity of BFS is O(V + E), where V is the number of vertices and E is the number of edges in the graph.

Depth First Search (DFS)

The thing I really like about DFS is that we get a different search pattern by using a stack (or recursion) instead of a queue. Starting at the root node, we explore as far as possible along each branch before backtracking. The running time of DFS is also O(V + E), but it can be more memory efficient than BFS in certain cases. Recursion should be used with caution, though, as it can lead to stack overflow for very deep graphs.

def dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
    if start not in visited:
        print("Visiting:", start)
        visited.add(start)
        for neighbor in graph[start]:
            if neighbor not in visited:
                dfs(graph, neighbor, visited)
    return visited

MLB Schedule Represented as a Graph

The MLB schedule can be represented as a graph where each team is a node and each game played between two teams is an edge connecting those nodes. This representation allows us to analyze the schedule using graph algorithms.

Based on Wikipedia, the 2025 MLB schedule consists of 162 games per team, with the following breakdown: - 13 games against each of the 4 divisional opponents (52 games) - 6 or 7 games against each of the 10 other league opponents (62 games) - 6 games against one “geographic” interleague opponent (6 games) - 3 games against the remaining 14 interleague opponents (42 games) Series range from 2 to 4 games, with 3-game series being the most common.

The visualization below shows the graph representation of the MLB schedule. I used NetworkX to create the graph and Matplotlib to visualize it. The nodes are aligned such that they form a circle. They are longitudinally arranged based on their geographical locations. There really is not much to take away from this visualization other than the fact that it shows the complex web of matchups between teams. Consider this an artistic expression.

Code

import pandas as pd
from collections import Counter, defaultdict
import networkx as nx
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
import itertools

team_name_map = {
    'ANA': 'LAA', 'ARI': 'ARI', 'ATL': 'ATL', 'BAL': 'BAL', 'BOS': 'BOS',
    'CHA': 'CWS', 'CHN': 'CHC', 'CIN': 'CIN', 'CLE': 'CLE', 'COL': 'COL',
    'DET': 'DET', 'HOU': 'HOU', 'KCA': 'KC', 'LAN': 'LAD', 'MIA': 'MIA',
    'MIL': 'MIL', 'MIN': 'MIN', 'NYA': 'NYY', 'NYN': 'NYM', 'ATH': 'ATH',
    'PHI': 'PHI', 'PIT': 'PIT', 'SDN': 'SD', 'SEA': 'SEA', 'SFN': 'SF',
    'SLN': 'STL', 'TBA': 'TB', 'TEX': 'TEX', 'TOR': 'TOR', 'WAS': 'WSH'
}

team_logos = {
    'ARI': 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Arizona_Diamondbacks_logo_teal.svg/330px-Arizona_Diamondbacks_logo_teal.svg.png',
    'ATL': 'https://upload.wikimedia.org/wikipedia/en/f/f2/Atlanta_Braves.svg',
    'BAL': 'https://upload.wikimedia.org/wikipedia/commons/e/e9/Baltimore_Orioles_Script.svg',
    'BOS': 'https://upload.wikimedia.org/wikipedia/en/6/6d/RedSoxPrimary_HangingSocks.svg',
    'CHC': 'https://upload.wikimedia.org/wikipedia/commons/8/80/Chicago_Cubs_logo.svg',
    'CWS': 'https://upload.wikimedia.org/wikipedia/commons/c/c1/Chicago_White_Sox.svg',
    'CIN': 'https://upload.wikimedia.org/wikipedia/commons/0/01/Cincinnati_Reds_Logo.svg',
    'CLE': 'https://upload.wikimedia.org/wikipedia/en/a/a9/Guardians_winged_%22G%22.svg',
    'COL': 'https://upload.wikimedia.org/wikipedia/en/c/c0/Colorado_Rockies_full_logo.svg',
    'DET': 'https://upload.wikimedia.org/wikipedia/commons/e/e3/Detroit_Tigers_logo.svg',
    'HOU': 'https://upload.wikimedia.org/wikipedia/commons/6/6b/Houston-Astros-Logo.svg',
    'KC': 'https://upload.wikimedia.org/wikipedia/commons/7/78/Kansas_City_Royals_Primary_Logo.svg',
    'LAA': 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Los_Angeles_Angels_of_Anaheim.svg',
    'LAD': 'https://upload.wikimedia.org/wikipedia/commons/0/0e/Los_Angeles_Dodgers_Logo.svg',
    'MIA': 'https://upload.wikimedia.org/wikipedia/en/f/fd/Marlins_team_logo.svg',
    'MIL': 'https://upload.wikimedia.org/wikipedia/en/b/b8/Milwaukee_Brewers_logo.svg',
    'MIN': 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Minnesota_Twins_New_Logo.svg/250px-Minnesota_Twins_New_Logo.svg.png',
    'NYM': 'https://upload.wikimedia.org/wikipedia/en/thumb/7/7b/New_York_Mets.svg/250px-New_York_Mets.svg.png',
    'NYY': 'https://upload.wikimedia.org/wikipedia/commons/f/fe/New_York_Yankees_Primary_Logo.svg',
    'ATH': 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Athletics_logo.svg/330px-Athletics_logo.svg.png',
    'PHI': 'https://upload.wikimedia.org/wikipedia/en/f/f0/Philadelphia_Phillies_%282019%29_logo.svg',
    'PIT': 'https://upload.wikimedia.org/wikipedia/commons/8/81/Pittsburgh_Pirates_logo_2014.svg',
    'SD': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/SD_Logo_Brown.svg/250px-SD_Logo_Brown.svg.png',
    'SF': 'https://upload.wikimedia.org/wikipedia/en/5/58/San_Francisco_Giants_Logo.svg',
    'SEA': 'https://upload.wikimedia.org/wikipedia/en/6/6d/Seattle_Mariners_logo_%28low_res%29.svg',
    'STL': 'https://upload.wikimedia.org/wikipedia/en/9/9d/St._Louis_Cardinals_logo.svg',
    'TB': 'https://upload.wikimedia.org/wikipedia/commons/5/55/Tampa_Bay_Rays_Logo.svg',
    'TEX': 'https://upload.wikimedia.org/wikipedia/commons/c/c7/Texas_Rangers_logo.svg',
    'TOR': 'https://upload.wikimedia.org/wikipedia/en/c/cc/Toronto_Blue_Jay_Primary_Logo.svg',
    'WSH': 'https://upload.wikimedia.org/wikipedia/commons/a/a3/Washington_Nationals_logo.svg',
    'POR': 'https://upload.wikimedia.org/wikipedia/commons/a/a6/Major_League_Baseball_logo.svg',
    'NSH': 'https://upload.wikimedia.org/wikipedia/commons/a/a6/Major_League_Baseball_logo.svg',
}


# Load the schedule data
df = pd.read_csv("2025schedule.csv")
df['Visitor'] = df['Visitor'].map(team_name_map)
df['Home'] = df['Home'].map(team_name_map)
df.dropna(subset=['Visitor', 'Home'], inplace=True)
games_from_df = df[['Visitor', 'Home']].values.tolist()
teams = list(pd.concat([df['Visitor'], df['Home']]).unique())

# Make a graph where teams are nodes and games are edges
G = nx.Graph()

# Use the canonical team names from team_name_map values
team_longitudes = {
    "BOS": -71.0589, "NYY": -73.92, "NYM": -73.84, "PHI": -75.16, "BAL": -76.61,
    "WSH": -77.03, "PIT": -79.99, "TOR": -79.38, "TB": -82.65, "MIA": -80.19,
    "ATL": -84.38, "CLE": -81.69, "CIN": -84.51, "DET": -83.04, "CWS": -87.62,
    "CHC": -87.62, "STL": -90.19, "MIL": -87.90, "MIN": -93.26, "KC": -94.48,
    "HOU": -95.36, "TEX": -96.79, "COL": -104.99, "ARI": -112.07, "LAD": -118.24,
    "LAA": -117.88, "SD": -117.16, "SF": -122.38, "ATH": -122.27, "SEA": -122.33
}

# Sort teams by longitude (East to West)
sorted_teams = sorted(teams, key=lambda team: team_longitudes[team])

# Create a graph
G = nx.Graph()

# Assign circular positions based on the sorted geographical order
team_positions = {}
num_teams = len(sorted_teams)
angle_step = 2 * np.pi / num_teams

for i, team in enumerate(sorted_teams):
    # Calculate angle and position
    angle = i * angle_step
    x = np.cos(angle)
    y = np.sin(angle)
    team_positions[team] = (x, y)
    # Add node with its calculated position
    G.add_node(team, pos=(x, y))

# Add edges
game_counts = Counter(tuple(sorted(game)) for game in games_from_df)
for (team1, team2), count in game_counts.items():
    if team1 in G.nodes and team2 in G.nodes:
        G.add_edge(team1, team2, weight=count)

# Draw the graph
# --- Visualization ---
plt.figure(figsize=(8, 8))
ax = plt.gca()
ax.set_title("MLB Team Matchups (Geographically Ordered Circle)", fontsize=16)

# Extract positions for drawing
pos = nx.get_node_attributes(G, 'pos')
# pos = nx.spring_layout(G, weight='weight', iterations=50, seed=47)

# Draw the graph
nx.draw_networkx_nodes(G, pos, node_color='skyblue', node_size=750, alpha=0.8)
nx.draw_networkx_edges(G, pos, edge_color='gray', width=1.0, alpha=0.6)
nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold')

# Remove axes for a cleaner look
ax.margins(0.1)
plt.axis("off")
plt.tight_layout()
plt.show()

Graph Network of MLB Teams and Their Matchups in the 2025 Season, Arranged in a Circle.

Interactive Graph

Let’s create an interactive version of the graph using Plotly. Instead of manually arranging each node, we can use a force-directed layout to position the nodes based on their connections. NetworkX uses a class of algorithms called Force-Directed graph drawing. They mainly serve to place nodes in a visually appealing way by simulating physical forces. The spring_layout algorithm is one such method. The algorithm works by treating nodes as repelling charged particles and edges as springs connecting them. Algorithmically, it minimizes the energy of the system until an equilibrium is reached; so we would have to specify the number of iterations k, as well as the spring constant k which determines the strength of the springs.

Observe the network structure and how teams are interconnected. This edges illustrate the number of games played between each pair of teams. As expected, we see that inter-division matchups are more frequent than inter-league ones. The main takeway is how MLB place an emphasis on inter-league regional matchups. Teams like the LA Dodgers and LA Angels squared up six times this season. Another example inludes the Chicago Cubs and Chicago White Sox.

Code

pos = nx.spring_layout(G, weight='weight', iterations=10000, seed=47)

# Set the calculated positions as a node attribute
nx.set_node_attributes(G, pos, 'pos')

# --- Visualization with Plotly ---
fig_data = []

# --- Create Edge Traces by Color with Hover Text ---
# Group edges by weight (number of games) to color them
edges_by_weight = defaultdict(lambda: {'x': [], 'y': [], 'text': []})
for edge in G.edges(data=True):
    weight = edge[2].get('weight', 1)
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    edges_by_weight[weight]['x'].extend([x0, x1, None])
    edges_by_weight[weight]['y'].extend([y0, y1, None])

# Define a colorscale
# Define a colorscale for the edges
min_weight = min(edges_by_weight.keys()) if edges_by_weight else 1
max_weight = max(edges_by_weight.keys()) if edges_by_weight else 1
base_colorscale = px.colors.sequential.Blues

# Create a discrete color map
unique_weights = sorted(list(edges_by_weight.keys()))
num_unique_weights = len(unique_weights)
if num_unique_weights > 1:
    color_map = {weight: base_colorscale[int(i * (len(base_colorscale) - 1) / (num_unique_weights - 1))] for i, weight in enumerate(unique_weights)}
else:
    color_map = {unique_weights[0]: base_colorscale[5]} if unique_weights else {}

for weight, a in sorted(edges_by_weight.items()):
    color = color_map.get(weight, '#888888') # Default color if weight not in map
    edge_trace = go.Scatter(x=a['x'], y=a['y'], line=dict(width=2, color=color), hoverinfo='none', mode='lines')
    fig_data.append(edge_trace)

edge_hover_x = []
edge_hover_y = []
edge_hover_text = []

for edge in G.edges(data=True):
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    weight = edge[2].get('weight', 1)
    # Position the hover point at the midpoint of the edge
    edge_hover_x.append((x0 + x1) / 2)
    edge_hover_y.append((y0 + y1) / 2)
    edge_hover_text.append(f'{edge[0]} vs {edge[1]}: {weight} games')

# This trace holds the hover text and has invisible markers
edge_hover_trace = go.Scatter(
    x=edge_hover_x,
    y=edge_hover_y,
    mode='markers',
    hoverinfo='text',
    text=edge_hover_text,
    marker=dict(size=20, color='rgba(0,0,0,0)') # Invisible markers with a larger hover area
)
fig_data.append(edge_hover_trace)

discrete_colorscale_for_bar = []
if num_unique_weights > 0:
    for i, weight in enumerate(unique_weights):
        color = color_map.get(weight, '#888888')
        # Define start and end points for this color's block in the bar (on a 0-1 scale)
        start_norm = i / num_unique_weights
        end_norm = (i + 1) / num_unique_weights
        discrete_colorscale_for_bar.append([start_norm, color])
        discrete_colorscale_for_bar.append([end_norm, color])

# The colorbar needs to map values to this new 0-1 scale.
# We'll place the tick labels for our unique_weights in the center of each color block.
tickvals_for_bar = [ (i + 0.5) / num_unique_weights for i in range(num_unique_weights) ]
ticktext_for_bar = [str(w) for w in unique_weights]

colorbar_trace = go.Scatter(x=[None], y=[None], mode='markers', marker=dict(
    # The dummy `color` value is irrelevant because we control the scale with cmin/cmax
    color=[0.5],
    # Here is our custom discrete colorscale
    colorscale=discrete_colorscale_for_bar,
    # We are now working in a normalized 0-1 space for the colorbar
    cmin=0,
    cmax=1,
    showscale=True,
    colorbar=dict(
        thickness=15,
        title='Games Played',
        xanchor='left',
        # Place tick labels at the center of each color block
        tickvals=tickvals_for_bar,
        ticktext=ticktext_for_bar,
        # This makes the color bar look more like distinct bins
        outlinewidth=0
    )),
    hoverinfo='none'
)
fig_data.append(colorbar_trace)
# --- Create Node Trace ---
node_x = []
node_y = []
for node in G.nodes():
    x, y = G.nodes[node]['pos']
    node_x.append(x)
    node_y.append(y)
    

node_adjacencies = []
node_text = []
for node, adjacencies in G.adjacency():
    num_connections = len(adjacencies)
    node_adjacencies.append(num_connections)
    node_text.append(f'Team: {node}')

node_trace = go.Scatter(
    x=node_x, y=node_y,
    mode='markers',
    hoverinfo='text',
    hovertext=node_text,
    marker=dict(
        size=35, # Make the hover area large
        color='rgba(255, 255, 255, 1)' # Make the markers invisible
    )
)
fig_data.append(node_trace)

layout_images = []
# Calculate the range of coordinates to dynamically size logos
x_range = max(node_x) - min(node_x) if node_x else 1
y_range = max(node_y) - min(node_y) if node_y else 1
logo_size_x = x_range * 0.08  # Adjust the multiplier as needed
logo_size_y = y_range * 0.08

for node in G.nodes():
    x, y = G.nodes[node]['pos']
    logo_url = team_logos.get(node)
    if logo_url:
        layout_images.append(dict(source=logo_url, xref="x", yref="y", x=x, y=y, sizex=logo_size_x, sizey=logo_size_y, xanchor="center", yanchor="middle", layer="above"))

# --- Create the Figure ---
fig = go.Figure(
    data=fig_data,
    layout=go.Layout(
        title='<br>MLB 2025 Schedule Matchups',
        showlegend=False,
        hovermode='closest',
        margin=dict(b=20,l=5,r=5,t=60),
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        height=600, width=800,
        images=layout_images,
    ),
)
fig.show()

Interactive Graph Network of MLB Teams and Their Matchups in the 2025 Season

MLB Expansion to 32 Teams as a Graph

Now let’s visualize the expanded league structure as a graph. In this hypothetical scenario, we have 32 teams divided into two leagues, each with four divisions. We introduce a Portland team to the American League. The National League receives a new team in Nashville. In summary, Table 1 shows the new team alignments.

Table 1: MLB Expansion to 32 Teams

League	Division	Teams
AL	East	BAL, BOS, NYY, WSH
AL	Central	CWS, CLE, DET, TOR
AL	South	HOU, KC, COL, TEX
AL	West	LAA, ATH, SEA, POR
NL	East	NYM, PHI, CIN, PIT
NL	Central	CHC, MIL, STL, MIN
NL	South	MIA, TB, NSH, ATL
NL	West	LAD, SD, SF, ARI

Of course, we will redo our artistic rendition under this new structure.

Code

# --- Team Data ---
# A dictionary to map old abbreviations to modern ones
team_name_map = {
    'ANA': 'LAA', 'ARI': 'ARI', 'ATL': 'ATL', 'BAL': 'BAL', 'BOS': 'BOS',
    'CHA': 'CWS', 'CHN': 'CHC', 'CIN': 'CIN', 'CLE': 'CLE', 'COL': 'COL',
    'DET': 'DET', 'HOU': 'HOU', 'KCA': 'KC', 'LAN': 'LAD', 'MIA': 'MIA',
    'MIL': 'MIL', 'MIN': 'MIN', 'NYA': 'NYY', 'NYN': 'NYM', 'ATH': 'OAK',
    'PHI': 'PHI', 'PIT': 'PIT', 'SDN': 'SD', 'SEA': 'SEA', 'SFN': 'SF',
    'SLN': 'STL', 'TBA': 'TB', 'TEX': 'TEX', 'TOR': 'TOR', 'WAS': 'WSH'
}

# Longitudes for geographical sorting (East to West)
team_longitudes = {
    "BOS": -71.0589, "NYY": -73.92, "NYM": -73.84, "PHI": -75.16, "BAL": -76.61,
    "WSH": -77.03, "PIT": -79.99, "TOR": -79.38, "TB": -82.65, "MIA": -80.19,
    "ATL": -84.38, "CLE": -81.69, "CIN": -84.51, "DET": -83.04, "CWS": -87.62,
    "CHC": -87.62, "STL": -90.19, "MIL": -87.90, "MIN": -93.26, "KC": -94.48,
    "HOU": -95.36, "TEX": -96.79, "COL": -104.99, "ARI": -112.07, "LAD": -118.24,
    "LAA": -117.88, "SD": -117.16, "SF": -122.38, "ATH": -122.27, "SEA": -122.33,
    "NSH": -86.78,  # Nashville
    "POR": -122.67 # Portland
}

# --- Simulate 32-Team League with 8 Divisions ---
# 1. Define the new 8-division structure (4 teams per division)
divisions = {
    "AL East": ["BAL", "BOS", "NYY", "WSH"],
    "AL Central": ["CWS", "CLE", "DET", "TOR"],
    "AL South": ["HOU", "KC", "COL", "TEX"],
    "AL West": ["LAA", "ATH", "SEA", "POR"],
    "NL East": ["NYM", "PHI", "CIN", "PIT"],
    "NL Central": ["CHC", "MIL", "STL", "MIN"],
    "NL South": ["MIA", "TB", "NSH", "ATL"],
    "NL West": ["LAD", "SD", "SF", "ARI"]
}

leagues = {
    "AL": divisions["AL East"] + divisions["AL Central"] + divisions["AL South"] + divisions["AL West"],
    "NL": divisions["NL East"] + divisions["NL Central"] + divisions["NL South"] + divisions["NL West"]
}

# 2. Simulate a full schedule based on the new structure
all_games = []
all_teams = leagues["AL"] + leagues["NL"]

# Scheduling formula for 162 games:
# - 14 games vs. 3 divisional opponents (42 games)
# - 6 games vs. 12 other league opponents (72 games)
# - 3 games vs. 16 interleague opponents (48 games)
# Use itertools.combinations to handle each pair only once, which is cleaner
# and avoids the double-counting issue with odd numbers of games.
for team1, team2 in itertools.combinations(all_teams, 2):
    # Determine the relationship between team1 and team2
    team1_league = "AL" if team1 in leagues["AL"] else "NL"
    team2_league = "AL" if team2 in leagues["AL"] else "NL"

    team1_division = next(name for name, teams in divisions.items() if team1 in teams)
    team2_division = next(name for name, teams in divisions.items() if team2 in teams)

    if team1_division == team2_division:
        # Divisional opponents play 14 games
        all_games.extend([(team1, team2)] * 14)
    elif team1_league == team2_league:
        # Other intraleague opponents play 6 games
        all_games.extend([(team1, team2)] * 6)
    else:
        # Interleague opponents play 3 games
        all_games.extend([(team1, team2)] * 3)

# --- Create Graph and Circular Layout ---
sorted_teams = sorted(all_teams, key=lambda team: team_longitudes[team])

G = nx.Graph()
num_teams = len(sorted_teams)
angle_step = 2 * np.pi / num_teams

for i, team in enumerate(sorted_teams):
    angle = i * angle_step
    x, y = np.cos(angle), np.sin(angle)
    G.add_node(team, pos=(x, y))

# The game list has duplicates (A vs B and B vs A), so Counter handles it well
game_counts = Counter(tuple(sorted(game)) for game in all_games)
for (team1, team2), count in game_counts.items():
    if G.has_node(team1) and G.has_node(team2):
        G.add_edge(team1, team2, weight=count)

# --- Visualization with Matplotlib ---
plt.figure(figsize=(8, 8))
ax = plt.gca()
ax.set_title("MLB Network with 32 Teams in 8 Divisions (Simulated Schedule)", fontsize=16)

# Extract positions for drawing
pos = nx.get_node_attributes(G, 'pos')

# Draw the graph components
nx.draw_networkx_nodes(G, pos, node_color='skyblue', node_size=750, alpha=0.9)
nx.draw_networkx_edges(G, pos, edge_color='gray', width=1.0, alpha=0.6)
nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold')

# Remove axes for a cleaner look
plt.axis("off")
plt.tight_layout()
plt.show()

Graph Network of Simulated 32-Team MLB Schedule

Last but not least, we can create an interactive graph to explore the simulated schedule.

Code

# --- Visualization with Plotly ---
pos = nx.spring_layout(G, weight='weight', iterations=10000, seed=47)

# Set the calculated positions as a node attribute
nx.set_node_attributes(G, pos, 'pos')

fig_data = []

# --- Create Edge Traces by Color with Hover Text ---
edges_by_weight = defaultdict(lambda: {'x': [], 'y': []})
for edge in G.edges(data=True):
    weight = edge[2].get('weight', 1)
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    edges_by_weight[weight]['x'].extend([x0, x1, None])
    edges_by_weight[weight]['y'].extend([y0, y1, None])

# Define a colorscale for the edges
min_weight = min(edges_by_weight.keys()) if edges_by_weight else 1
max_weight = max(edges_by_weight.keys()) if edges_by_weight else 1
base_colorscale = px.colors.sequential.Blues

# Create a discrete color map
unique_weights = sorted(list(edges_by_weight.keys()))
num_unique_weights = len(unique_weights)
if num_unique_weights > 1:
    color_map = {weight: base_colorscale[int(i * (len(base_colorscale) - 1) / (num_unique_weights - 1))] for i, weight in enumerate(unique_weights)}
else:
    color_map = {unique_weights[0]: base_colorscale[5]} if unique_weights else {}

for weight, a in sorted(edges_by_weight.items()):
    color = color_map.get(weight, '#888888') # Default color if weight not in map
    edge_trace = go.Scatter(x=a['x'], y=a['y'], line=dict(width=2, color=color), hoverinfo='none', mode='lines')
    fig_data.append(edge_trace)

edge_hover_x = []
edge_hover_y = []
edge_hover_text = []

for edge in G.edges(data=True):
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    weight = edge[2].get('weight', 1)
    # Position the hover point at the midpoint of the edge
    edge_hover_x.append((x0 + x1) / 2)
    edge_hover_y.append((y0 + y1) / 2)
    edge_hover_text.append(f'{edge[0]} vs {edge[1]}: {weight} games')

# This trace holds the hover text and has invisible markers
edge_hover_trace = go.Scatter(
    x=edge_hover_x,
    y=edge_hover_y,
    mode='markers',
    hoverinfo='text',
    text=edge_hover_text,
    marker=dict(size=20, color='rgba(0,0,0,0)') # Invisible markers with a larger hover area
)
fig_data.append(edge_hover_trace)

discrete_colorscale_for_bar = []
if num_unique_weights > 0:
    for i, weight in enumerate(unique_weights):
        color = color_map.get(weight, '#888888')
        # Define start and end points for this color's block in the bar (on a 0-1 scale)
        start_norm = i / num_unique_weights
        end_norm = (i + 1) / num_unique_weights
        discrete_colorscale_for_bar.append([start_norm, color])
        discrete_colorscale_for_bar.append([end_norm, color])

# The colorbar needs to map values to this new 0-1 scale.
# We'll place the tick labels for our unique_weights in the center of each color block.
tickvals_for_bar = [ (i + 0.5) / num_unique_weights for i in range(num_unique_weights) ] if num_unique_weights > 0 else []
ticktext_for_bar = [str(w) for w in unique_weights]

colorbar_trace = go.Scatter(
    x=[None], y=[None], mode='markers', 
    marker=dict(
        color=[0.5],
        colorscale=discrete_colorscale_for_bar,
        cmin=0,
        cmax=1,
        showscale=True,
        colorbar=dict(
            thickness=15,
            title='Games Played',
            xanchor='left',
            tickvals=tickvals_for_bar,
            ticktext=ticktext_for_bar,
            outlinewidth=0
    )),
    hoverinfo='none'
)
fig_data.append(colorbar_trace)

# --- Create Node Trace ---
node_x = []
node_y = []
for node in G.nodes():
    x, y = G.nodes[node]['pos']
    node_x.append(x)
    node_y.append(y)

node_text = []
for node in G.nodes():
    node_text.append(f'Team: {node}')

node_trace = go.Scatter(
    x=node_x, y=node_y,
    mode='markers',
    hoverinfo='text',
    hovertext=node_text,
    marker=dict(
        size=35, # Make the hover area large
        color='rgba(255, 255, 255, 0)' # Make the markers invisible
    )
)
fig_data.append(node_trace)

layout_images = []
# Calculate the range of coordinates to dynamically size logos
x_range = max(node_x) - min(node_x) if node_x else 1
y_range = max(node_y) - min(node_y) if node_y else 1
logo_size_x = x_range * 0.08  # Adjust the multiplier as needed
logo_size_y = y_range * 0.08

for node in G.nodes():
    x, y = G.nodes[node]['pos']
    logo_url = team_logos.get(node)
    if logo_url:
        layout_images.append(dict(source=logo_url, xref="x", yref="y", x=x, y=y, sizex=logo_size_x, sizey=logo_size_y, xanchor="center", yanchor="middle", layer="above"))

# --- Create the Figure ---
fig = go.Figure(
    data=fig_data,
    layout=go.Layout(
        title='<br>MLB 32-Team Expansion Simulation',
        showlegend=False,
        hovermode='closest',
        margin=dict(b=20,l=5,r=5,t=60),
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        height=600, width=800,
        images=layout_images,
    ),
)
fig.show()

Interactive Graph Network of Simulated 32-Team MLB Schedule

The main advantage to this approach is that it allows for a more flexible and dynamic scheduling system, accommodating the unique needs and rivalries of the expanded league. By leveraging graph theory, we can better understand the relationships between teams and optimize the scheduling process for fairness and competitiveness.

This realignment places a greater emphasis on regional rivalries and travel considerations, potentially leading to a more engaging and balanced competition. I am personally a strong proponent of getting Portland a MLB team. Portland having a team would drastically cut down on the travel time for the Seattle Mariners. In addition, a Portland team would play into the already existing rivalry with Seattle. Just look at Border Clash, a regional cross country race that highlights the competitive spirit between these two cities - moreover states.

Nashville makes too much sense in terms of breaking up Atlanta’s stranglehold on the Southeast. With a team in Nashville, we could see some exciting matchups and rivalries emerge, particularly with the Atlanta Braves and the Cincinnati Reds. The geographic diversity would also help to balance the league and create a more dynamic schedule. Although, I could see the case for a team in Charlotte as well.

Conclusion

In the end, what is the Major League Baseball schedule? It’s a historical document, a 162-game compromise cobbled together by tradition, television contracts, and the brute force of geography. It works because it’s always worked. But when you strip away the nostalgia and the seventy-five-dollar parking, when you look at it not as a list of games but as a network, you’re left with a simple mathematical object: a graph.

And like any system, once you see its underlying structure, you can’t help but notice the flaws. They’re no longer just quirks; they’re inefficiencies. The Seattle Mariners’ travel woes aren’t just a logistical headache; on the graph, they’re a grotesquely long edge stretching across half the country, a screamingly obvious point of imbalance. Adding a team in Portland doesn’t just create a fun rivalry; it shortens that edge, creating a tighter, more rational cluster of nodes in the Pacific Northwest. The same goes for Nashville, a move that doesn’t just tap a new market but breaks up the Braves’ regional monopoly, forging new, logical connections in the Southeast.

This is the power of looking at the world through a different lens. It transforms the debate from one of pure opinion—“I think this city deserves a team!”—into a problem of optimization. You’re no longer just adding dots to a map; you’re balancing a network, minimizing the distance between nodes, and engineering more compelling matchups by strengthening local connections. The old men who run baseball can spend the next decade debating expansion in conference rooms, arguing over demographics and media rights. Or they could just run the algorithm. After all, the best path forward is often the shortest one… although Mike Piazza would disagree.

Note

Past articles: - Principal Component Analysis - Support Vector Machine - K-Means Clustering Github: - Running on Numbers

--- title: "Visualizing MLB Realignment with Graph Theory" author: "Oliver Chang" email: oliverc1622@gmail.com date: 2025-09-04 # Update this date when you make changes categories: [MLB, graph theory, data visualization] toc: true format: html: html-math-method: katex code-tools: true image: "main.png" bibliography: references.bib title-block-banner: default --- <iframe src="https://streamable.com/m/shohei-ohtani-k-s-mike-trout-twice-with-nasty-pitches?partnerId=web_video-playback-page_video-share" width="560" height="315"></iframe> ## Introduction In this article, we chime in on the current discourse surrounding MLB expansion and realignment. Recently, [Rob Manfred mentioned](https://www.forbes.com/sites/maurybrown/2025/08/18/as-manfred-touts-geographic-realignment-with-mlb-expansion-heres-how-it-could-look/) the possibility of adding two more teams to the league. Across social media, fans have expressed their opinions on the matter. I want to explore how graph theory can help visualize the current MLB schedule and how it might change with expansion and realignment. In college, I majored in math and computer science. While my emphasis was on statistics and machine learning, I always had a keen interest in number theory, graph theory, and combinatorics; in fact, my senior thesis was on fast matrix multiplication. One course I missed out on during my undergrad was a graph theory course that was offered at Harvey Mudd College. Given that this blog serves as a platform for me to explore my interests, I thought it would be fun to apply graph theory concepts to MLB scheduling. ### Graph Theory 101 Let's introduce some graph theory formality. We'll be taking note from [Wikipedia's](https://en.wikipedia.org/wiki/Graph_theory) definition. Let a graph be defined as $G=(V,E)$ where $V$ is a set of vertices and $E$ is a set of edges. Vertices can also be referred to as nodes, and edges can be thought of as connections or links between these nodes. $E \subseteq \{ (u,v) | u,v \in V \text{ and } u \neq v \}.$ This is a simple, undirected graph without loops or multiple edges. A directed graph is a graph in which edges have orientations, indicating a one-way relationship between vertices. Edges can also have weights, representing the strength or capacity of the connection. A directed graph has the same annotation as an undirected graph, but the edges are ordered pairs. Hence, $E \subseteq \{ (u,v) | u,v \in V^2 \text{ and } u \neq v \}.$ V^2 is represents the cartisian product ($V \times V$). This is the set of all possible ordered pairs of vertices. An edge in this graph, for example, could be $(u,v)$ but not $(v,u)$. To further illustrate this point, consider four verticies ($V = \{A, B, B\}$). The set of edges in the undirected graph would be $\{(A,B), (A,C), (A,D), (B,C), (B,D), (C,D)\}$, while the set of edges in the directed graph would be $$\{(A,B), (B,A), (A,C), \\ (C,A), (A,D), (D,A), \\ (B,C), (C,B), (B,D), \\ (D,B), (C,D), (D,C)\}.$$ #### Breadth First Search (BFS) BFS is an algorithm for traversing or searching tree or graph data structures. It starts at the tree root (or some arbitrary node of a graph, sometimes referred to as a 'search key') and explores the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level. There are ample resources online to learn about [BFS](https://en.wikipedia.org/wiki/Breadth-first_search). I want to highlight that BFS uses a Queue data structure to keep track of the nodes to be explored next. There's a quick pseudo code implementation: ```{python} from collections import deque def bfs(graph, start): visited = set() queue = deque([start]) while queue: vertex = queue.popleft() if vertex not in visited: print("Visiting:", vertex) visited.add(vertex) for neighbor in graph[vertex]: if neighbor not in visited: queue.append(neighbor) ``` The runtime complexity of BFS is O(V + E), where V is the number of vertices and E is the number of edges in the graph. #### Depth First Search (DFS) The thing I really like about DFS is that we get a different search pattern by using a stack (or recursion) instead of a queue. Starting at the root node, we explore as far as possible along each branch before backtracking. The running time of DFS is also O(V + E), but it can be more memory efficient than BFS in certain cases. Recursion should be used with caution, though, as it can lead to stack overflow for very deep graphs. ```{python} def dfs(graph, start, visited=None): if visited is None: visited = set() if start not in visited: print("Visiting:", start) visited.add(start) for neighbor in graph[start]: if neighbor not in visited: dfs(graph, neighbor, visited) return visited ``` ## MLB Schedule Represented as a Graph The MLB schedule can be represented as a graph where each team is a node and each game played between two teams is an edge connecting those nodes. This representation allows us to analyze the schedule using graph algorithms. Based on [Wikipedia](https://en.wikipedia.org/wiki/Major_League_Baseball_schedule#2025%E2%80%932026), the 2025 MLB schedule consists of 162 games per team, with the following breakdown: - 13 games against each of the 4 divisional opponents (52 games) - 6 or 7 games against each of the 10 other league opponents (62 games) - 6 games against one "geographic" interleague opponent (6 games) - 3 games against the remaining 14 interleague opponents (42 games) Series range from 2 to 4 games, with 3-game series being the most common. The visualization below shows the graph representation of the MLB schedule. I used `NetworkX` to create the graph and Matplotlib to visualize it. The nodes are aligned such that they form a circle. They are longitudinally arranged based on their geographical locations. There really is not much to take away from this visualization other than the fact that it shows the complex web of matchups between teams. Consider this an artistic expression. ```{python} #| code-fold: true #| warning: true #| fig-cap: "Graph Network of MLB Teams and Their Matchups in the 2025 Season, Arranged in a Circle." #| label: fig:mlb-graph #| fig-alt: "Graph Network of MLB Teams and Their Matchups in the 2025 Season, Arranged in a Circle." import pandas as pd from collections import Counter, defaultdict import networkx as nx import matplotlib.pyplot as plt import plotly.graph_objects as go import plotly.express as px import numpy as np import itertools team_name_map = { 'ANA': 'LAA', 'ARI': 'ARI', 'ATL': 'ATL', 'BAL': 'BAL', 'BOS': 'BOS', 'CHA': 'CWS', 'CHN': 'CHC', 'CIN': 'CIN', 'CLE': 'CLE', 'COL': 'COL', 'DET': 'DET', 'HOU': 'HOU', 'KCA': 'KC', 'LAN': 'LAD', 'MIA': 'MIA', 'MIL': 'MIL', 'MIN': 'MIN', 'NYA': 'NYY', 'NYN': 'NYM', 'ATH': 'ATH', 'PHI': 'PHI', 'PIT': 'PIT', 'SDN': 'SD', 'SEA': 'SEA', 'SFN': 'SF', 'SLN': 'STL', 'TBA': 'TB', 'TEX': 'TEX', 'TOR': 'TOR', 'WAS': 'WSH' } team_logos = { 'ARI': 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Arizona_Diamondbacks_logo_teal.svg/330px-Arizona_Diamondbacks_logo_teal.svg.png', 'ATL': 'https://upload.wikimedia.org/wikipedia/en/f/f2/Atlanta_Braves.svg', 'BAL': 'https://upload.wikimedia.org/wikipedia/commons/e/e9/Baltimore_Orioles_Script.svg', 'BOS': 'https://upload.wikimedia.org/wikipedia/en/6/6d/RedSoxPrimary_HangingSocks.svg', 'CHC': 'https://upload.wikimedia.org/wikipedia/commons/8/80/Chicago_Cubs_logo.svg', 'CWS': 'https://upload.wikimedia.org/wikipedia/commons/c/c1/Chicago_White_Sox.svg', 'CIN': 'https://upload.wikimedia.org/wikipedia/commons/0/01/Cincinnati_Reds_Logo.svg', 'CLE': 'https://upload.wikimedia.org/wikipedia/en/a/a9/Guardians_winged_%22G%22.svg', 'COL': 'https://upload.wikimedia.org/wikipedia/en/c/c0/Colorado_Rockies_full_logo.svg', 'DET': 'https://upload.wikimedia.org/wikipedia/commons/e/e3/Detroit_Tigers_logo.svg', 'HOU': 'https://upload.wikimedia.org/wikipedia/commons/6/6b/Houston-Astros-Logo.svg', 'KC': 'https://upload.wikimedia.org/wikipedia/commons/7/78/Kansas_City_Royals_Primary_Logo.svg', 'LAA': 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Los_Angeles_Angels_of_Anaheim.svg', 'LAD': 'https://upload.wikimedia.org/wikipedia/commons/0/0e/Los_Angeles_Dodgers_Logo.svg', 'MIA': 'https://upload.wikimedia.org/wikipedia/en/f/fd/Marlins_team_logo.svg', 'MIL': 'https://upload.wikimedia.org/wikipedia/en/b/b8/Milwaukee_Brewers_logo.svg', 'MIN': 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Minnesota_Twins_New_Logo.svg/250px-Minnesota_Twins_New_Logo.svg.png', 'NYM': 'https://upload.wikimedia.org/wikipedia/en/thumb/7/7b/New_York_Mets.svg/250px-New_York_Mets.svg.png', 'NYY': 'https://upload.wikimedia.org/wikipedia/commons/f/fe/New_York_Yankees_Primary_Logo.svg', 'ATH': 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Athletics_logo.svg/330px-Athletics_logo.svg.png', 'PHI': 'https://upload.wikimedia.org/wikipedia/en/f/f0/Philadelphia_Phillies_%282019%29_logo.svg', 'PIT': 'https://upload.wikimedia.org/wikipedia/commons/8/81/Pittsburgh_Pirates_logo_2014.svg', 'SD': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/SD_Logo_Brown.svg/250px-SD_Logo_Brown.svg.png', 'SF': 'https://upload.wikimedia.org/wikipedia/en/5/58/San_Francisco_Giants_Logo.svg', 'SEA': 'https://upload.wikimedia.org/wikipedia/en/6/6d/Seattle_Mariners_logo_%28low_res%29.svg', 'STL': 'https://upload.wikimedia.org/wikipedia/en/9/9d/St._Louis_Cardinals_logo.svg', 'TB': 'https://upload.wikimedia.org/wikipedia/commons/5/55/Tampa_Bay_Rays_Logo.svg', 'TEX': 'https://upload.wikimedia.org/wikipedia/commons/c/c7/Texas_Rangers_logo.svg', 'TOR': 'https://upload.wikimedia.org/wikipedia/en/c/cc/Toronto_Blue_Jay_Primary_Logo.svg', 'WSH': 'https://upload.wikimedia.org/wikipedia/commons/a/a3/Washington_Nationals_logo.svg', 'POR': 'https://upload.wikimedia.org/wikipedia/commons/a/a6/Major_League_Baseball_logo.svg', 'NSH': 'https://upload.wikimedia.org/wikipedia/commons/a/a6/Major_League_Baseball_logo.svg', } # Load the schedule data df = pd.read_csv("2025schedule.csv") df['Visitor'] = df['Visitor'].map(team_name_map) df['Home'] = df['Home'].map(team_name_map) df.dropna(subset=['Visitor', 'Home'], inplace=True) games_from_df = df[['Visitor', 'Home']].values.tolist() teams = list(pd.concat([df['Visitor'], df['Home']]).unique()) # Make a graph where teams are nodes and games are edges G = nx.Graph() # Use the canonical team names from team_name_map values team_longitudes = { "BOS": -71.0589, "NYY": -73.92, "NYM": -73.84, "PHI": -75.16, "BAL": -76.61, "WSH": -77.03, "PIT": -79.99, "TOR": -79.38, "TB": -82.65, "MIA": -80.19, "ATL": -84.38, "CLE": -81.69, "CIN": -84.51, "DET": -83.04, "CWS": -87.62, "CHC": -87.62, "STL": -90.19, "MIL": -87.90, "MIN": -93.26, "KC": -94.48, "HOU": -95.36, "TEX": -96.79, "COL": -104.99, "ARI": -112.07, "LAD": -118.24, "LAA": -117.88, "SD": -117.16, "SF": -122.38, "ATH": -122.27, "SEA": -122.33 } # Sort teams by longitude (East to West) sorted_teams = sorted(teams, key=lambda team: team_longitudes[team]) # Create a graph G = nx.Graph() # Assign circular positions based on the sorted geographical order team_positions = {} num_teams = len(sorted_teams) angle_step = 2 * np.pi / num_teams for i, team in enumerate(sorted_teams): # Calculate angle and position angle = i * angle_step x = np.cos(angle) y = np.sin(angle) team_positions[team] = (x, y) # Add node with its calculated position G.add_node(team, pos=(x, y)) # Add edges game_counts = Counter(tuple(sorted(game)) for game in games_from_df) for (team1, team2), count in game_counts.items(): if team1 in G.nodes and team2 in G.nodes: G.add_edge(team1, team2, weight=count) # Draw the graph # --- Visualization --- plt.figure(figsize=(8, 8)) ax = plt.gca() ax.set_title("MLB Team Matchups (Geographically Ordered Circle)", fontsize=16) # Extract positions for drawing pos = nx.get_node_attributes(G, 'pos') # pos = nx.spring_layout(G, weight='weight', iterations=50, seed=47) # Draw the graph nx.draw_networkx_nodes(G, pos, node_color='skyblue', node_size=750, alpha=0.8) nx.draw_networkx_edges(G, pos, edge_color='gray', width=1.0, alpha=0.6) nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold') # Remove axes for a cleaner look ax.margins(0.1) plt.axis("off") plt.tight_layout() plt.show() ``` ### Interactive Graph Let’s create an interactive version of the graph using Plotly. Instead of manually arranging each node, we can use a force-directed layout to position the nodes based on their connections. `NetworkX` uses a class of algorithms called Force-Directed graph drawing. They mainly serve to place nodes in a visually appealing way by simulating physical forces. The `spring_layout` algorithm is one such method. The algorithm works by treating nodes as repelling charged particles and edges as springs connecting them. Algorithmically, it minimizes the energy of the system until an equilibrium is reached; so we would have to specify the number of iterations `k`, as well as the spring constant `k` which determines the strength of the springs. Observe the network structure and how teams are interconnected. This edges illustrate the number of games played between each pair of teams. As expected, we see that inter-division matchups are more frequent than inter-league ones. The main takeway is how MLB place an emphasis on inter-league regional matchups. Teams like the [LA Dodgers](https://www.baseball-reference.com/teams/LAD/2025.shtml) and [LA Angels](https://www.baseball-reference.com/teams/LAA/2025.shtml) squared up six times this season. Another example inludes the [Chicago Cubs](https://www.baseball-reference.com/teams/CHC/2025.shtml) and [Chicago White Sox](https://www.baseball-reference.com/teams/CHW/2025.shtml). ```{python} #| code-fold: true #| warning: true #| fig-cap: "Interactive Graph Network of MLB Teams and Their Matchups in the 2025 Season" #| label: fig:mlb-graph-interactive #| fig-alt: "Interactive Graph Network of MLB Teams and Their Matchups in the 2025 Season" pos = nx.spring_layout(G, weight='weight', iterations=10000, seed=47) # Set the calculated positions as a node attribute nx.set_node_attributes(G, pos, 'pos') # --- Visualization with Plotly --- fig_data = [] # --- Create Edge Traces by Color with Hover Text --- # Group edges by weight (number of games) to color them edges_by_weight = defaultdict(lambda: {'x': [], 'y': [], 'text': []}) for edge in G.edges(data=True): weight = edge[2].get('weight', 1) x0, y0 = G.nodes[edge[0]]['pos'] x1, y1 = G.nodes[edge[1]]['pos'] edges_by_weight[weight]['x'].extend([x0, x1, None]) edges_by_weight[weight]['y'].extend([y0, y1, None]) # Define a colorscale # Define a colorscale for the edges min_weight = min(edges_by_weight.keys()) if edges_by_weight else 1 max_weight = max(edges_by_weight.keys()) if edges_by_weight else 1 base_colorscale = px.colors.sequential.Blues # Create a discrete color map unique_weights = sorted(list(edges_by_weight.keys())) num_unique_weights = len(unique_weights) if num_unique_weights > 1: color_map = {weight: base_colorscale[int(i * (len(base_colorscale) - 1) / (num_unique_weights - 1))] for i, weight in enumerate(unique_weights)} else: color_map = {unique_weights[0]: base_colorscale[5]} if unique_weights else {} for weight, a in sorted(edges_by_weight.items()): color = color_map.get(weight, '#888888') # Default color if weight not in map edge_trace = go.Scatter(x=a['x'], y=a['y'], line=dict(width=2, color=color), hoverinfo='none', mode='lines') fig_data.append(edge_trace) edge_hover_x = [] edge_hover_y = [] edge_hover_text = [] for edge in G.edges(data=True): x0, y0 = G.nodes[edge[0]]['pos'] x1, y1 = G.nodes[edge[1]]['pos'] weight = edge[2].get('weight', 1) # Position the hover point at the midpoint of the edge edge_hover_x.append((x0 + x1) / 2) edge_hover_y.append((y0 + y1) / 2) edge_hover_text.append(f'{edge[0]} vs {edge[1]}: {weight} games') # This trace holds the hover text and has invisible markers edge_hover_trace = go.Scatter( x=edge_hover_x, y=edge_hover_y, mode='markers', hoverinfo='text', text=edge_hover_text, marker=dict(size=20, color='rgba(0,0,0,0)') # Invisible markers with a larger hover area ) fig_data.append(edge_hover_trace) discrete_colorscale_for_bar = [] if num_unique_weights > 0: for i, weight in enumerate(unique_weights): color = color_map.get(weight, '#888888') # Define start and end points for this color's block in the bar (on a 0-1 scale) start_norm = i / num_unique_weights end_norm = (i + 1) / num_unique_weights discrete_colorscale_for_bar.append([start_norm, color]) discrete_colorscale_for_bar.append([end_norm, color]) # The colorbar needs to map values to this new 0-1 scale. # We'll place the tick labels for our unique_weights in the center of each color block. tickvals_for_bar = [ (i + 0.5) / num_unique_weights for i in range(num_unique_weights) ] ticktext_for_bar = [str(w) for w in unique_weights] colorbar_trace = go.Scatter(x=[None], y=[None], mode='markers', marker=dict( # The dummy `color` value is irrelevant because we control the scale with cmin/cmax color=[0.5], # Here is our custom discrete colorscale colorscale=discrete_colorscale_for_bar, # We are now working in a normalized 0-1 space for the colorbar cmin=0, cmax=1, showscale=True, colorbar=dict( thickness=15, title='Games Played', xanchor='left', # Place tick labels at the center of each color block tickvals=tickvals_for_bar, ticktext=ticktext_for_bar, # This makes the color bar look more like distinct bins outlinewidth=0 )), hoverinfo='none' ) fig_data.append(colorbar_trace) # --- Create Node Trace --- node_x = [] node_y = [] for node in G.nodes(): x, y = G.nodes[node]['pos'] node_x.append(x) node_y.append(y) node_adjacencies = [] node_text = [] for node, adjacencies in G.adjacency(): num_connections = len(adjacencies) node_adjacencies.append(num_connections) node_text.append(f'Team: {node}') node_trace = go.Scatter( x=node_x, y=node_y, mode='markers', hoverinfo='text', hovertext=node_text, marker=dict( size=35, # Make the hover area large color='rgba(255, 255, 255, 1)' # Make the markers invisible ) ) fig_data.append(node_trace) layout_images = [] # Calculate the range of coordinates to dynamically size logos x_range = max(node_x) - min(node_x) if node_x else 1 y_range = max(node_y) - min(node_y) if node_y else 1 logo_size_x = x_range * 0.08 # Adjust the multiplier as needed logo_size_y = y_range * 0.08 for node in G.nodes(): x, y = G.nodes[node]['pos'] logo_url = team_logos.get(node) if logo_url: layout_images.append(dict(source=logo_url, xref="x", yref="y", x=x, y=y, sizex=logo_size_x, sizey=logo_size_y, xanchor="center", yanchor="middle", layer="above")) # --- Create the Figure --- fig = go.Figure( data=fig_data, layout=go.Layout( title='<br>MLB 2025 Schedule Matchups', showlegend=False, hovermode='closest', margin=dict(b=20,l=5,r=5,t=60), xaxis=dict(showgrid=False, zeroline=False, showticklabels=False), yaxis=dict(showgrid=False, zeroline=False, showticklabels=False), height=600, width=800, images=layout_images, ), ) fig.show() ``` ### MLB Expansion to 32 Teams as a Graph Now let's visualize the expanded league structure as a graph. In this hypothetical scenario, we have 32 teams divided into two leagues, each with four divisions. We introduce a Portland team to the American League. The National League receives a new team in Nashville. In summary, @tbl-mlb-expansion shows the new team alignments. | League | Division | Teams | |--------|--------------|---------------------------------------| | AL | East | BAL, BOS, NYY, WSH | | AL | Central | CWS, CLE, DET, TOR | | AL | South | HOU, KC, COL, TEX | | AL | West | LAA, ATH, SEA, POR | | NL | East | NYM, PHI, CIN, PIT | | NL | Central | CHC, MIL, STL, MIN | | NL | South | MIA, TB, NSH, ATL | | NL | West | LAD, SD, SF, ARI | : MLB Expansion to 32 Teams {#tbl-mlb-expansion} Of course, we will redo our artistic rendition under this new structure. ```{python} # | code-fold: true # | warning: true # | fig-cap: "Graph Network of Simulated 32-Team MLB Schedule" # | label: fig:mlb-graph-32 # | fig-alt: "Graph Network of Simulated 32-Team MLB Schedule" # --- Team Data --- # A dictionary to map old abbreviations to modern ones team_name_map = { 'ANA': 'LAA', 'ARI': 'ARI', 'ATL': 'ATL', 'BAL': 'BAL', 'BOS': 'BOS', 'CHA': 'CWS', 'CHN': 'CHC', 'CIN': 'CIN', 'CLE': 'CLE', 'COL': 'COL', 'DET': 'DET', 'HOU': 'HOU', 'KCA': 'KC', 'LAN': 'LAD', 'MIA': 'MIA', 'MIL': 'MIL', 'MIN': 'MIN', 'NYA': 'NYY', 'NYN': 'NYM', 'ATH': 'OAK', 'PHI': 'PHI', 'PIT': 'PIT', 'SDN': 'SD', 'SEA': 'SEA', 'SFN': 'SF', 'SLN': 'STL', 'TBA': 'TB', 'TEX': 'TEX', 'TOR': 'TOR', 'WAS': 'WSH' } # Longitudes for geographical sorting (East to West) team_longitudes = { "BOS": -71.0589, "NYY": -73.92, "NYM": -73.84, "PHI": -75.16, "BAL": -76.61, "WSH": -77.03, "PIT": -79.99, "TOR": -79.38, "TB": -82.65, "MIA": -80.19, "ATL": -84.38, "CLE": -81.69, "CIN": -84.51, "DET": -83.04, "CWS": -87.62, "CHC": -87.62, "STL": -90.19, "MIL": -87.90, "MIN": -93.26, "KC": -94.48, "HOU": -95.36, "TEX": -96.79, "COL": -104.99, "ARI": -112.07, "LAD": -118.24, "LAA": -117.88, "SD": -117.16, "SF": -122.38, "ATH": -122.27, "SEA": -122.33, "NSH": -86.78, # Nashville "POR": -122.67 # Portland } # --- Simulate 32-Team League with 8 Divisions --- # 1. Define the new 8-division structure (4 teams per division) divisions = { "AL East": ["BAL", "BOS", "NYY", "WSH"], "AL Central": ["CWS", "CLE", "DET", "TOR"], "AL South": ["HOU", "KC", "COL", "TEX"], "AL West": ["LAA", "ATH", "SEA", "POR"], "NL East": ["NYM", "PHI", "CIN", "PIT"], "NL Central": ["CHC", "MIL", "STL", "MIN"], "NL South": ["MIA", "TB", "NSH", "ATL"], "NL West": ["LAD", "SD", "SF", "ARI"] } leagues = { "AL": divisions["AL East"] + divisions["AL Central"] + divisions["AL South"] + divisions["AL West"], "NL": divisions["NL East"] + divisions["NL Central"] + divisions["NL South"] + divisions["NL West"] } # 2. Simulate a full schedule based on the new structure all_games = [] all_teams = leagues["AL"] + leagues["NL"] # Scheduling formula for 162 games: # - 14 games vs. 3 divisional opponents (42 games) # - 6 games vs. 12 other league opponents (72 games) # - 3 games vs. 16 interleague opponents (48 games) # Use itertools.combinations to handle each pair only once, which is cleaner # and avoids the double-counting issue with odd numbers of games. for team1, team2 in itertools.combinations(all_teams, 2): # Determine the relationship between team1 and team2 team1_league = "AL" if team1 in leagues["AL"] else "NL" team2_league = "AL" if team2 in leagues["AL"] else "NL" team1_division = next(name for name, teams in divisions.items() if team1 in teams) team2_division = next(name for name, teams in divisions.items() if team2 in teams) if team1_division == team2_division: # Divisional opponents play 14 games all_games.extend([(team1, team2)] * 14) elif team1_league == team2_league: # Other intraleague opponents play 6 games all_games.extend([(team1, team2)] * 6) else: # Interleague opponents play 3 games all_games.extend([(team1, team2)] * 3) # --- Create Graph and Circular Layout --- sorted_teams = sorted(all_teams, key=lambda team: team_longitudes[team]) G = nx.Graph() num_teams = len(sorted_teams) angle_step = 2 * np.pi / num_teams for i, team in enumerate(sorted_teams): angle = i * angle_step x, y = np.cos(angle), np.sin(angle) G.add_node(team, pos=(x, y)) # The game list has duplicates (A vs B and B vs A), so Counter handles it well game_counts = Counter(tuple(sorted(game)) for game in all_games) for (team1, team2), count in game_counts.items(): if G.has_node(team1) and G.has_node(team2): G.add_edge(team1, team2, weight=count) # --- Visualization with Matplotlib --- plt.figure(figsize=(8, 8)) ax = plt.gca() ax.set_title("MLB Network with 32 Teams in 8 Divisions (Simulated Schedule)", fontsize=16) # Extract positions for drawing pos = nx.get_node_attributes(G, 'pos') # Draw the graph components nx.draw_networkx_nodes(G, pos, node_color='skyblue', node_size=750, alpha=0.9) nx.draw_networkx_edges(G, pos, edge_color='gray', width=1.0, alpha=0.6) nx.draw_networkx_labels(G, pos, font_size=10, font_weight='bold') # Remove axes for a cleaner look plt.axis("off") plt.tight_layout() plt.show() ``` Last but not least, we can create an interactive graph to explore the simulated schedule. ```{python} # | code-fold: true # | warning: true # | fig-cap: "Interactive Graph Network of Simulated 32-Team MLB Schedule" # | label: fig:mlb-graph-interactive-32 # | fig-alt: "Interactive Graph Network of Simulated 32-Team MLB Schedule" # --- Visualization with Plotly --- pos = nx.spring_layout(G, weight='weight', iterations=10000, seed=47) # Set the calculated positions as a node attribute nx.set_node_attributes(G, pos, 'pos') fig_data = [] # --- Create Edge Traces by Color with Hover Text --- edges_by_weight = defaultdict(lambda: {'x': [], 'y': []}) for edge in G.edges(data=True): weight = edge[2].get('weight', 1) x0, y0 = G.nodes[edge[0]]['pos'] x1, y1 = G.nodes[edge[1]]['pos'] edges_by_weight[weight]['x'].extend([x0, x1, None]) edges_by_weight[weight]['y'].extend([y0, y1, None]) # Define a colorscale for the edges min_weight = min(edges_by_weight.keys()) if edges_by_weight else 1 max_weight = max(edges_by_weight.keys()) if edges_by_weight else 1 base_colorscale = px.colors.sequential.Blues # Create a discrete color map unique_weights = sorted(list(edges_by_weight.keys())) num_unique_weights = len(unique_weights) if num_unique_weights > 1: color_map = {weight: base_colorscale[int(i * (len(base_colorscale) - 1) / (num_unique_weights - 1))] for i, weight in enumerate(unique_weights)} else: color_map = {unique_weights[0]: base_colorscale[5]} if unique_weights else {} for weight, a in sorted(edges_by_weight.items()): color = color_map.get(weight, '#888888') # Default color if weight not in map edge_trace = go.Scatter(x=a['x'], y=a['y'], line=dict(width=2, color=color), hoverinfo='none', mode='lines') fig_data.append(edge_trace) edge_hover_x = [] edge_hover_y = [] edge_hover_text = [] for edge in G.edges(data=True): x0, y0 = G.nodes[edge[0]]['pos'] x1, y1 = G.nodes[edge[1]]['pos'] weight = edge[2].get('weight', 1) # Position the hover point at the midpoint of the edge edge_hover_x.append((x0 + x1) / 2) edge_hover_y.append((y0 + y1) / 2) edge_hover_text.append(f'{edge[0]} vs {edge[1]}: {weight} games') # This trace holds the hover text and has invisible markers edge_hover_trace = go.Scatter( x=edge_hover_x, y=edge_hover_y, mode='markers', hoverinfo='text', text=edge_hover_text, marker=dict(size=20, color='rgba(0,0,0,0)') # Invisible markers with a larger hover area ) fig_data.append(edge_hover_trace) discrete_colorscale_for_bar = [] if num_unique_weights > 0: for i, weight in enumerate(unique_weights): color = color_map.get(weight, '#888888') # Define start and end points for this color's block in the bar (on a 0-1 scale) start_norm = i / num_unique_weights end_norm = (i + 1) / num_unique_weights discrete_colorscale_for_bar.append([start_norm, color]) discrete_colorscale_for_bar.append([end_norm, color]) # The colorbar needs to map values to this new 0-1 scale. # We'll place the tick labels for our unique_weights in the center of each color block. tickvals_for_bar = [ (i + 0.5) / num_unique_weights for i in range(num_unique_weights) ] if num_unique_weights > 0 else [] ticktext_for_bar = [str(w) for w in unique_weights] colorbar_trace = go.Scatter( x=[None], y=[None], mode='markers', marker=dict( color=[0.5], colorscale=discrete_colorscale_for_bar, cmin=0, cmax=1, showscale=True, colorbar=dict( thickness=15, title='Games Played', xanchor='left', tickvals=tickvals_for_bar, ticktext=ticktext_for_bar, outlinewidth=0 )), hoverinfo='none' ) fig_data.append(colorbar_trace) # --- Create Node Trace --- node_x = [] node_y = [] for node in G.nodes(): x, y = G.nodes[node]['pos'] node_x.append(x) node_y.append(y) node_text = [] for node in G.nodes(): node_text.append(f'Team: {node}') node_trace = go.Scatter( x=node_x, y=node_y, mode='markers', hoverinfo='text', hovertext=node_text, marker=dict( size=35, # Make the hover area large color='rgba(255, 255, 255, 0)' # Make the markers invisible ) ) fig_data.append(node_trace) layout_images = [] # Calculate the range of coordinates to dynamically size logos x_range = max(node_x) - min(node_x) if node_x else 1 y_range = max(node_y) - min(node_y) if node_y else 1 logo_size_x = x_range * 0.08 # Adjust the multiplier as needed logo_size_y = y_range * 0.08 for node in G.nodes(): x, y = G.nodes[node]['pos'] logo_url = team_logos.get(node) if logo_url: layout_images.append(dict(source=logo_url, xref="x", yref="y", x=x, y=y, sizex=logo_size_x, sizey=logo_size_y, xanchor="center", yanchor="middle", layer="above")) # --- Create the Figure --- fig = go.Figure( data=fig_data, layout=go.Layout( title='<br>MLB 32-Team Expansion Simulation', showlegend=False, hovermode='closest', margin=dict(b=20,l=5,r=5,t=60), xaxis=dict(showgrid=False, zeroline=False, showticklabels=False), yaxis=dict(showgrid=False, zeroline=False, showticklabels=False), height=600, width=800, images=layout_images, ), ) fig.show() ``` The main advantage to this approach is that it allows for a more flexible and dynamic scheduling system, accommodating the unique needs and rivalries of the expanded league. By leveraging graph theory, we can better understand the relationships between teams and optimize the scheduling process for fairness and competitiveness. This realignment places a greater emphasis on regional rivalries and travel considerations, potentially leading to a more engaging and balanced competition. I am personally a strong proponent of getting Portland a MLB team. Portland having a team would drastically cut down on the travel time for the Seattle Mariners. In addition, a Portland team would play into the already existing rivalry with Seattle. Just look at [Border Clash](https://nikeborderclash.runnerspace.com/), a regional cross country race that highlights the competitive spirit between these two cities - moreover states. Nashville makes too much sense in terms of breaking up Atlanta's stranglehold on the Southeast. With a team in Nashville, we could see some exciting matchups and rivalries emerge, particularly with the Atlanta Braves and the Cincinnati Reds. The geographic diversity would also help to balance the league and create a more dynamic schedule. Although, I could see the case for a team in Charlotte as well. ## Conclusion In the end, what is the Major League Baseball schedule? It’s a historical document, a 162-game compromise cobbled together by tradition, television contracts, and the brute force of geography. It works because it’s always worked. But when you strip away the nostalgia and the seventy-five-dollar parking, when you look at it not as a list of games but as a network, you’re left with a simple mathematical object: a graph. And like any system, once you see its underlying structure, you can't help but notice the flaws. They’re no longer just quirks; they’re inefficiencies. The Seattle Mariners’ travel woes aren’t just a logistical headache; on the graph, they’re a grotesquely long edge stretching across half the country, a screamingly obvious point of imbalance. Adding a team in Portland doesn’t just create a fun rivalry; it shortens that edge, creating a tighter, more rational cluster of nodes in the Pacific Northwest. The same goes for Nashville, a move that doesn't just tap a new market but breaks up the Braves' regional monopoly, forging new, logical connections in the Southeast. This is the power of looking at the world through a different lens. It transforms the debate from one of pure opinion—"I think this city deserves a team!"—into a problem of optimization. You're no longer just adding dots to a map; you're balancing a network, minimizing the distance between nodes, and engineering more compelling matchups by strengthening local connections. The old men who run baseball can spend the next decade debating expansion in conference rooms, arguing over demographics and media rights. Or they could just run the algorithm. After all, the best path forward is often the shortest one... although Mike Piazza would disagree. <iframe src="https://streamable.com/m/clemens-throws-bat-c20053773?partnerId=web_video-playback-page_video-share" width="560" height="315"></iframe> :::{.callout-note} Past articles: - [Principal Component Analysis](https://runningonnumbers.com/posts/principal-component-analysis-python-baseball/) - [Support Vector Machine](https://runningonnumbers.com/posts/support-vector-machine/) - [K-Means Clustering](https://runningonnumbers.com/posts/k-means/) Github: - [Running on Numbers](https://github.com/oliverc1623/Running-On-Numbers-Public) ::: <script async data-uid="5d16db9e50" src="https://runningonnumbers.kit.com/5d16db9e50/index.js"></script>