Fantasy Baseball Player Analysis¶

This Jupyter Notebook performs an exploratory data analysis on fantasy baseball player statistics. The primary purpose is to identify top-performing players, potential sleepers, and anomalies based on various offensive metrics. We will visualize key statistics, analyze correlations, and derive insights that could be valuable for fantasy baseball drafting or trade decisions.

1. Import Necessary Libraries¶

In [1]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings

# Ignore all warnings for cleaner output
warnings.filterwarnings('ignore')

2. Load CSV Data¶

In [2]:

try:
    df = pd.read_csv('fantasy_bbl.csv')
    print("CSV data loaded successfully.")
except FileNotFoundError:
    print("Error: 'fantasy_bbl.csv' not found. Please ensure the file is in the correct directory.")
    # Create a dummy DataFrame for demonstration if the file is not found
    # This dummy data is based on the provided CSV content for consistency
    data = {
        '#': [1, 2, 3, 4, 5, 13, 126, 436],
        'Name': ['Aaron Judge', 'Shohei Ohtani', 'Bobby Witt Jr.', 'Cal Raleigh', 'Juan Soto', 'Ronald Acuña Jr.', 'Mike Trout', 'Austin Hedges'],
        'Team': ['NYY', 'LAD', 'KCR', 'SEA', 'NYM', 'ATL', 'LAA', 'CLE'],
        'G': [56, 56, 56, 53, 56, 41, 34, 30],
        'PA': [245, 253, 244, 214, 246, 182, 143, 83],
        'AB': [200, 217, 223, 186, 195, 154, 120, 74],
        'H': [61, 62, 64, 44, 52, 46, 30, 13],
        '2B': [11, 10, 15, 8, 9, 8, 5, 2],
        '3B': [0, 3, 3, 0, 0, 1, 1, 0],
        'HR': [19, 18, 10, 14, 13, 9, 8, 2],
        'R': [43, 49, 37, 29, 40, 36, 20, 7],
        'RBI': [47, 47, 36, 36, 36, 24, 21, 6],
        'BB': [41, 32, 17, 25, 48, 25, 20, 6],
        'SO': [65, 60, 43, 56, 42, 38, 39, 24],
        'HBP': [2, 2, 2, 2, 1, 2, 2, 1],
        'SB': [3, 13, 12, 3, 4, 10, 1, 1],
        'CS': [1, 2, 4, 1, 1, 2, 0, 0],
        'BB%': ['16.80%', '12.70%', '7.10%', '11.50%', '19.70%', '13.60%', '14.10%', '6.90%'],
        'K%': ['26.40%', '23.90%', '17.70%', '26.20%', '17.00%', '20.70%', '27.50%', '29.40%'],
        'ISO': [0.336, 0.318, 0.221, 0.278, 0.244, 0.242, 0.248, 0.099],
        'BABIP': [0.355, 0.313, 0.318, 0.256, 0.279, 0.343, 0.301, 0.226],
        'AVG': [0.303, 0.285, 0.289, 0.238, 0.268, 0.300, 0.251, 0.172],
        'OBP': [0.423, 0.379, 0.343, 0.33, 0.413, 0.403, 0.365, 0.239],
        'SLG': [0.639, 0.603, 0.51, 0.517, 0.512, 0.542, 0.500, 0.272],
        'OPS': [1.062, 0.981, 0.853, 0.847, 0.925, 0.945, 0.865, 0.511],
        'wOBA': [0.439, 0.410, 0.362, 0.358, 0.397, 0.406, 0.368, 0.231],
        'wRC+': [187, 167, 130, 138, 162, 163, 136, 45],
        'ADP': [3.1, 1.5, 1.6, 85.6, 7.1, 37.6, 111.9, 748.0]
    }
    df = pd.DataFrame(data)
    print("Using dummy data for demonstration.")

CSV data loaded successfully.

3. Initial Data Exploration and Cleaning¶

In [3]:

print("\nDataFrame Head:")
print(df.head())

print("\nDataFrame Info (before cleaning):")
df.info()

print("\nDataFrame Description:")
print(df.describe())

# Clean percentage columns (BB%, K%) by removing '%' and converting to float
for col in ['BB%', 'K%']:
    if col in df.columns:
        df[col] = df[col].astype(str).str.replace('%', '').astype(float) / 100

# Handle 'ADP' values of 999 (likely means undrafted or very late). Replace with NaN for better statistical handling.
df['ADP'] = df['ADP'].replace(999, np.nan)

print("\nData types after cleaning:")
df.info()

DataFrame Head:
   #            Name Team   G   PA   AB   H  2B  3B  HR  ...      K%    ISO  \
0  1     Aaron Judge  NYY  56  245  200  61  11   0  19  ...  26.40%  0.336   
1  2   Shohei Ohtani  LAD  56  253  217  62  10   3  18  ...  23.90%  0.318   
2  3  Bobby Witt Jr.  KCR  56  244  223  64  15   3  10  ...  17.70%  0.221   
3  4     Cal Raleigh  SEA  53  214  186  44   8   0  14  ...  26.20%  0.278   
4  5       Juan Soto  NYM  56  246  195  52   9   0  13  ...  17.00%  0.244   

   BABIP    AVG    OBP    SLG    OPS   wOBA wRC+   ADP  
0  0.355  0.303  0.423  0.639  1.062  0.439  187   3.1  
1  0.313  0.285  0.379  0.603  0.981  0.410  167   1.5  
2  0.318  0.289  0.343  0.510  0.853  0.362  130   1.6  
3  0.256  0.238  0.330  0.517  0.847  0.358  138  85.6  
4  0.279  0.268  0.413  0.512  0.925  0.397  162   7.1  

[5 rows x 28 columns]

DataFrame Info (before cleaning):
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 606 entries, 0 to 605
Data columns (total 28 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   #       606 non-null    int64  
 1   Name    606 non-null    object 
 2   Team    590 non-null    object 
 3   G       606 non-null    int64  
 4   PA      606 non-null    int64  
 5   AB      606 non-null    int64  
 6   H       606 non-null    int64  
 7   2B      606 non-null    int64  
 8   3B      606 non-null    int64  
 9   HR      606 non-null    int64  
 10  R       606 non-null    int64  
 11  RBI     606 non-null    int64  
 12  BB      606 non-null    int64  
 13  SO      606 non-null    int64  
 14  HBP     606 non-null    int64  
 15  SB      606 non-null    int64  
 16  CS      606 non-null    int64  
 17  BB%     606 non-null    object 
 18  K%      606 non-null    object 
 19  ISO     606 non-null    float64
 20  BABIP   606 non-null    float64
 21  AVG     606 non-null    float64
 22  OBP     606 non-null    float64
 23  SLG     606 non-null    float64
 24  OPS     606 non-null    float64
 25  wOBA    606 non-null    float64
 26  wRC+    606 non-null    int64  
 27  ADP     606 non-null    float64
dtypes: float64(8), int64(16), object(4)
memory usage: 132.7+ KB

DataFrame Description:
               #           G          PA          AB           H          2B  \
count  606.00000  606.000000  606.000000  606.000000  606.000000  606.000000   
mean   303.50000   39.326733  150.749175  134.881188   32.976898    6.529703   
std    175.08141    9.611057   45.820055   40.798790   11.836886    2.489305   
min      1.00000   12.000000   41.000000   36.000000    8.000000    1.000000   
25%    152.25000   32.000000  117.000000  105.000000   24.000000    5.000000   
50%    303.50000   39.000000  145.000000  130.000000   31.000000    6.000000   
75%    454.75000   47.000000  184.750000  165.000000   41.000000    8.000000   
max    606.00000   58.000000  253.000000  228.000000   67.000000   15.000000   

               3B          HR           R         RBI  ...          CS  \
count  606.000000  606.000000  606.000000  606.000000  ...  606.000000   
mean     0.605611    4.297030   18.110561   17.693069  ...    0.831683   
std      0.671425    2.751569    7.032110    7.247788  ...    0.811927   
min      0.000000    0.000000    4.000000    4.000000  ...    0.000000   
25%      0.000000    2.000000   13.000000   13.000000  ...    0.000000   
50%      1.000000    4.000000   17.000000   16.000000  ...    1.000000   
75%      1.000000    6.000000   22.000000   22.000000  ...    1.000000   
max      5.000000   19.000000   49.000000   47.000000  ...    4.000000   

              ISO       BABIP         AVG         OBP         SLG         OPS  \
count  606.000000  606.000000  606.000000  606.000000  606.000000  606.000000   
mean     0.148525    0.294822    0.240870    0.312190    0.389408    0.701594   
std      0.040727    0.022822    0.021165    0.024806    0.050077    0.067495   
min      0.046000    0.217000    0.171000    0.232000    0.241000    0.509000   
25%      0.121000    0.280000    0.227000    0.297000    0.358500    0.659000   
50%      0.144000    0.295000    0.240000    0.311000    0.385000    0.697500   
75%      0.174000    0.310000    0.255000    0.327000    0.416000    0.738000   
max      0.336000    0.374000    0.303000    0.423000    0.639000    1.062000   

             wOBA        wRC+         ADP  
count  606.000000  606.000000  606.000000  
mean     0.308239   96.064356  543.642904  
std      0.026186   18.294311  323.721651  
min      0.231000   42.000000    1.500000  
25%      0.292000   84.000000  253.625000  
50%      0.307000   95.000000  557.350000  
75%      0.323750  106.000000  750.000000  
max      0.439000  187.000000  999.000000  

[8 rows x 24 columns]

Data types after cleaning:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 606 entries, 0 to 605
Data columns (total 28 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   #       606 non-null    int64  
 1   Name    606 non-null    object 
 2   Team    590 non-null    object 
 3   G       606 non-null    int64  
 4   PA      606 non-null    int64  
 5   AB      606 non-null    int64  
 6   H       606 non-null    int64  
 7   2B      606 non-null    int64  
 8   3B      606 non-null    int64  
 9   HR      606 non-null    int64  
 10  R       606 non-null    int64  
 11  RBI     606 non-null    int64  
 12  BB      606 non-null    int64  
 13  SO      606 non-null    int64  
 14  HBP     606 non-null    int64  
 15  SB      606 non-null    int64  
 16  CS      606 non-null    int64  
 17  BB%     606 non-null    float64
 18  K%      606 non-null    float64
 19  ISO     606 non-null    float64
 20  BABIP   606 non-null    float64
 21  AVG     606 non-null    float64
 22  OBP     606 non-null    float64
 23  SLG     606 non-null    float64
 24  OPS     606 non-null    float64
 25  wOBA    606 non-null    float64
 26  wRC+    606 non-null    int64  
 27  ADP     475 non-null    float64
dtypes: float64(10), int64(16), object(2)
memory usage: 132.7+ KB

4. Key Performance Indicators (KPIs) Analysis¶

In [4]:

print("\nTop 10 Players by OPS:")
print(df.sort_values(by='OPS', ascending=False).head(10)[['Name', 'Team', 'OPS', 'ADP']])

print("\nTop 10 Players by wOBA:")
print(df.sort_values(by='wOBA', ascending=False).head(10)[['Name', 'Team', 'wOBA', 'ADP']])

print("\nTop 10 Players by wRC+:")
print(df.sort_values(by='wRC+', ascending=False).head(10)[['Name', 'Team', 'wRC+', 'ADP']])

print("\nTop 10 Home Run Hitters:")
print(df.sort_values(by='HR', ascending=False).head(10)[['Name', 'Team', 'HR', 'ADP']])

print("\nTop 10 Stolen Base Threats:")
print(df.sort_values(by='SB', ascending=False).head(10)[['Name', 'Team', 'SB', 'ADP']])

print("\nTop 10 Players by Batting Average (min 100 PA):")
print(df[df['PA'] >= 100].sort_values(by='AVG', ascending=False).head(10)[['Name', 'Team', 'AVG', 'ADP']])

Top 10 Players by OPS:
                 Name Team    OPS    ADP
0         Aaron Judge  NYY  1.062    3.1
1       Shohei Ohtani  LAD  0.981    1.5
12   Ronald Acuña Jr.  ATL  0.945   37.6
4           Juan Soto  NYM  0.925    7.1
46     Yordan Alvarez  HOU  0.919   17.1
36       Bryce Harper  PHI  0.890   22.3
57       Brent Rooker  ATH  0.874   62.2
14       Corey Seager  TEX  0.869   46.4
19        Ketel Marte  ARI  0.866   30.7
125        Mike Trout  LAA  0.865  111.9

Top 10 Players by wOBA:
                Name Team   wOBA   ADP
0        Aaron Judge  NYY  0.439   3.1
1      Shohei Ohtani  LAD  0.410   1.5
12  Ronald Acuña Jr.  ATL  0.406  37.6
4          Juan Soto  NYM  0.397   7.1
46    Yordan Alvarez  HOU  0.388  17.1
36      Bryce Harper  PHI  0.380  22.3
57      Brent Rooker  ATH  0.373  62.2
19       Ketel Marte  ARI  0.371  30.7
14      Corey Seager  TEX  0.371  46.4
31   Freddie Freeman  LAD  0.371  26.9

Top 10 Players by wRC+:
                     Name Team  wRC+   ADP
0             Aaron Judge  NYY   187   3.1
1           Shohei Ohtani  LAD   167   1.5
12       Ronald Acuña Jr.  ATL   163  37.6
4               Juan Soto  NYM   162   7.1
46         Yordan Alvarez  HOU   152  17.1
36           Bryce Harper  PHI   144  22.3
14           Corey Seager  TEX   141  46.4
31        Freddie Freeman  LAD   140  26.9
25  Vladimir Guerrero Jr.  TOR   139  13.2
19            Ketel Marte  ARI   139  30.7

Top 10 Home Run Hitters:
              Name Team  HR    ADP
0      Aaron Judge  NYY  19    3.1
1    Shohei Ohtani  LAD  18    1.5
60  Kyle Schwarber  PHI  15   75.9
3      Cal Raleigh  SEA  14   85.6
4        Juan Soto  NYM  13    7.1
57    Brent Rooker  ATH  13   62.2
47  Eugenio Suárez  ARI  12  168.7
45     Pete Alonso  NYM  12   47.6
35      Matt Olson  ATL  11   34.0
5     José Ramírez  CLE  11    4.9

Top 10 Stolen Base Threats:
                    Name Team  SB    ADP
243     Chandler Simpson  TBR  19    NaN
10       Elly De La Cruz  CIN  18    4.3
244       José Caballero  TBR  15  327.1
1          Shohei Ohtani  LAD  13    1.5
13   Pete Crow-Armstrong  CHC  13  127.4
170      Victor Scott II  STL  13  321.9
84          Brice Turang  MIL  12  145.1
5           José Ramírez  CLE  12    4.9
2         Bobby Witt Jr.  KCR  12    1.6
63             CJ Abrams  WSN  11   49.6

Top 10 Players by Batting Average (min 100 PA):
                      Name Team    AVG    ADP
0              Aaron Judge  NYY  0.303    3.1
192            Luis Arraez  SDP  0.301  185.1
110           Jacob Wilson  ATH  0.300  327.1
12        Ronald Acuña Jr.  ATL  0.300   37.6
243       Chandler Simpson  TBR  0.296    NaN
31         Freddie Freeman  LAD  0.296   26.9
102         Xavier Edwards  MIA  0.292  139.8
25   Vladimir Guerrero Jr.  TOR  0.289   13.2
2           Bobby Witt Jr.  KCR  0.289    1.6
142             Yandy Díaz  TBR  0.287  195.7

5. Data Visualization¶

Distribution of Key Offensive Stats¶

In [5]:

fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('Distribution of Key Offensive Statistics', fontsize=16)

sns.histplot(df['HR'], bins=10, kde=True, ax=axes[0, 0])
axes[0, 0].set_title('Home Runs (HR)')

sns.histplot(df['SB'], bins=10, kde=True, ax=axes[0, 1])
axes[0, 1].set_title('Stolen Bases (SB)')

sns.histplot(df['AVG'], bins=10, kde=True, ax=axes[0, 2])
axes[0, 2].set_title('Batting Average (AVG)')

sns.histplot(df['OPS'], bins=10, kde=True, ax=axes[1, 0])
axes[1, 0].set_title('On-base Plus Slugging (OPS)')

sns.histplot(df['wOBA'], bins=10, kde=True, ax=axes[1, 1])
axes[1, 1].set_title('Weighted On-Base Average (wOBA)')

# Filter out NaN ADP values for plotting to avoid errors
sns.histplot(df['ADP'].dropna(), bins=20, kde=True, ax=axes[1, 2])
axes[1, 2].set_title('Average Draft Position (ADP)')

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()

No description has been provided for this image

Performance vs. ADP¶

In [6]:

# Filter out players with NaN ADP for this plot to avoid errors
df_filtered_adp = df.dropna(subset=['ADP']).copy()

plt.figure(figsize=(12, 8))
sns.scatterplot(data=df_filtered_adp, x='ADP', y='wOBA', hue='Team', size='PA', sizes=(20, 400), alpha=0.7)
plt.title('wOBA vs. Average Draft Position (ADP)', fontsize=16)
plt.xlabel('Average Draft Position (Lower is Better)', fontsize=12)
plt.ylabel('Weighted On-Base Average (wOBA)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.gca().invert_xaxis() # Lower ADP is better, so invert x-axis
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.show()

Correlation Heatmap of Offensive Stats¶

In [7]:

correlation_cols = ['G', 'PA', 'AB', 'H', '2B', '3B', 'HR', 'R', 'RBI', 'BB', 'SO', 'HBP', 'SB', 'CS', 'BB%', 'K%', 'ISO', 'BABIP', 'AVG', 'OBP', 'SLG', 'OPS', 'wOBA', 'wRC+', 'ADP']
plt.figure(figsize=(16, 12))
sns.heatmap(df[correlation_cols].corr(), annot=False, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Matrix of Offensive Statistics', fontsize=16)
plt.show()

6. Insights and Anomalies¶

Identifying Potential Sleepers (High Performance, Low ADP)¶

In [8]:

# Define thresholds for 'high performance' and 'low ADP'
# For demonstration, let's consider players in the top 75th percentile for wOBA and bottom 50th percentile for ADP (higher ADP number means later draft)
woba_threshold = df_filtered_adp['wOBA'].quantile(0.75)
adp_threshold = df_filtered_adp['ADP'].quantile(0.50)

sleepers = df_filtered_adp[(df_filtered_adp['wOBA'] >= woba_threshold) & (df_filtered_adp['ADP'] >= adp_threshold)]
print(f"\nPotential Sleepers (wOBA >= {woba_threshold:.3f} and ADP >= {adp_threshold:.1f}):")
print(sleepers[['Name', 'Team', 'wOBA', 'ADP', 'HR', 'SB', 'AVG']].sort_values(by='wOBA', ascending=False).head(10))

print("\nThese players are performing well relative to their average draft position, indicating potential value.")

Potential Sleepers (wOBA >= 0.330 and ADP >= 417.1):
                 Name Team   wOBA    ADP  HR  SB    AVG
90       Kyle Stowers  MIA  0.348  552.8   9   1  0.256
302     Rob Refsnyder  BOS  0.343  735.4   3   1  0.263
344     Mickey Moniak  COL  0.337  590.3   7   3  0.259
384  Masataka Yoshida  BOS  0.336  487.2   3   1  0.282
211     Mike Tauchman  CHW  0.335  519.6   3   1  0.256
409     Tyler Freeman  COL  0.335  603.0   2   5  0.276
328     Romy Gonzalez  BOS  0.334  672.9   3   4  0.270
501      Keston Hiura  COL  0.334  749.6   4   1  0.250
54    Geraldo Perdomo  ARI  0.333  430.3   4   6  0.262
79      Trent Grisham  NYY  0.333  728.7   7   2  0.227

These players are performing well relative to their average draft position, indicating potential value.

Identifying Overvalued Players (Low Performance, High ADP)¶

In [9]:

# Define thresholds for 'low performance' and 'high ADP'
# For demonstration, let's consider players in the bottom 25th percentile for wOBA and top 25th percentile for ADP (lower ADP number means earlier draft)
woba_low_threshold = df_filtered_adp['wOBA'].quantile(0.25)
adp_high_threshold = df_filtered_adp['ADP'].quantile(0.25) 

overvalued = df_filtered_adp[(df_filtered_adp['wOBA'] <= woba_low_threshold) & (df_filtered_adp['ADP'] <= adp_high_threshold)]
print(f"\nPotentially Overvalued Players (wOBA <= {woba_low_threshold:.3f} and ADP <= {adp_high_threshold:.1f}):")
print(overvalued[['Name', 'Team', 'wOBA', 'ADP', 'HR', 'SB', 'AVG']].sort_values(by='ADP', ascending=True).head(10))

print("\nThese players are performing below expectations relative to their average draft position, indicating they might be overvalued.")

Potentially Overvalued Players (wOBA <= 0.296 and ADP <= 200.1):
             Name Team   wOBA    ADP  HR  SB    AVG
351  Luis Rengifo  LAA  0.294  177.6   4   4  0.251

These players are performing below expectations relative to their average draft position, indicating they might be overvalued.

Anomalies: High K% / Low BB% Players¶

In [10]:

# Players with very high K% (e.g., top 10%) and low BB% (e.g., bottom 10%)
high_k_threshold = df['K%'].quantile(0.90)
low_bb_threshold = df['BB%'].quantile(0.10)

anomalies_k_bb = df[(df['K%'] >= high_k_threshold) & (df['BB%'] <= low_bb_threshold)]
print(f"\nPlayers with High K% ({high_k_threshold:.2%}) and Low BB% ({low_bb_threshold:.2%}):")
print(anomalies_k_bb[['Name', 'Team', 'K%', 'BB%', 'OPS']].sort_values(by='K%', ascending=False).head(10))

print("\nThese players might be prone to slumps due to poor plate discipline, or they are high-risk, high-reward power hitters.")

Players with High K% (30.40%) and Low BB% (5.60%):
              Name Team     K%    BB%    OPS
589  Aramis Garcia  ARI  0.373  0.031  0.519
555       Tim Elko  CHW  0.353  0.053  0.673

These players might be prone to slumps due to poor plate discipline, or they are high-risk, high-reward power hitters.

Anomalies: BABIP Outliers (Potential Regression/Progression Candidates)¶

In [11]:

# Players with unusually high BABIP (e.g., top 5%) - potential regression candidates
high_babip_threshold = df['BABIP'].quantile(0.95)
high_babip_players = df[df['BABIP'] >= high_babip_threshold]
print(f"\nPlayers with Unusually High BABIP ({high_babip_threshold:.3f}) - Potential Regression Candidates:")
print(high_babip_players[['Name', 'Team', 'BABIP', 'AVG', 'OPS']].sort_values(by='BABIP', ascending=False).head(10))

# Players with unusually low BABIP (e.g., bottom 5%) - potential positive regression candidates
low_babip_threshold = df['BABIP'].quantile(0.05)
low_babip_players = df[df['BABIP'] <= low_babip_threshold]
print(f"\nPlayers with Unusually Low BABIP ({low_babip_threshold:.3f}) - Potential Positive Regression Candidates:")
print(low_babip_players[['Name', 'Team', 'BABIP', 'AVG', 'OPS']].sort_values(by='BABIP', ascending=True).head(10))

print("\nBABIP can indicate luck. High BABIP might mean a player is due for a batting average dip, while low BABIP might mean they are due for a rise.")

Players with Unusually High BABIP (0.333) - Potential Regression Candidates:
                 Name Team  BABIP    AVG    OPS
408        Greg Jones  HOU  0.374  0.218  0.632
96    Jonathan Aranda  TBR  0.359  0.282  0.807
0         Aaron Judge  NYY  0.355  0.303  1.062
470   Carson McCusker  MIN  0.348  0.234  0.698
10    Elly De La Cruz  CIN  0.346  0.270  0.826
462      Zach Dezenzo  HOU  0.346  0.252  0.719
328     Romy Gonzalez  BOS  0.344  0.270  0.783
12   Ronald Acuña Jr.  ATL  0.343  0.300  0.945
33       Riley Greene  DET  0.343  0.271  0.823
555          Tim Elko  CHW  0.342  0.238  0.673

Players with Unusually Low BABIP (0.258) - Potential Positive Regression Candidates:
                Name Team  BABIP    AVG    OPS
600   Jac Caglianone  KCR  0.217  0.193  0.609
435    Austin Hedges  CLE  0.226  0.172  0.511
601  Anthony Seigler  MIL  0.238  0.194  0.606
568    Jason Heyward  NaN  0.238  0.206  0.614
130     Danny Jansen  TBR  0.242  0.221  0.737
131        Bo Naylor  CLE  0.242  0.204  0.681
286   Carlos Santana  CLE  0.243  0.221  0.682
466       Luis Matos  SFG  0.245  0.236  0.668
590   Andrew Knizner  SFG  0.245  0.205  0.568
314        JJ Bleday  ATH  0.249  0.220  0.723

BABIP can indicate luck. High BABIP might mean a player is due for a batting average dip, while low BABIP might mean they are due for a rise.

7. Simple Prediction/Outlook¶

Identifying Players with High Power and Speed Potential¶

In [12]:

# Define thresholds for power (HR) and speed (SB) (e.g., top 20% for both)
hr_threshold = df['HR'].quantile(0.80)
sb_threshold = df['SB'].quantile(0.80)

power_speed_threats = df[(df['HR'] >= hr_threshold) & (df['SB'] >= sb_threshold)]
print(f"\nPlayers with High Power (HR >= {hr_threshold:.0f}) and High Speed (SB >= {sb_threshold:.0f}):")
print(power_speed_threats[['Name', 'Team', 'HR', 'SB', 'OPS', 'ADP']].sort_values(by=['HR', 'SB'], ascending=False).head(10))

print("\nThese players offer a valuable combination of power and speed, which is highly sought after in fantasy baseball.")

Players with High Power (HR >= 7) and High Speed (SB >= 5):
                    Name Team  HR  SB    OPS    ADP
1          Shohei Ohtani  LAD  18  13  0.981    1.5
5           José Ramírez  CLE  11  12  0.853    4.9
10       Elly De La Cruz  CIN  10  18  0.826    4.3
2         Bobby Witt Jr.  KCR  10  12  0.853    1.6
7        Julio Rodríguez  SEA  10  10  0.783   13.7
6       Francisco Lindor  NYM  10   8  0.778   14.1
8     Fernando Tatis Jr.  SDP  10   8  0.843   11.6
29            James Wood  WSN  10   7  0.840   53.3
146        Adolis García  TEX  10   5  0.726  151.5
13   Pete Crow-Armstrong  CHC   9  13  0.777  127.4

These players offer a valuable combination of power and speed, which is highly sought after in fantasy baseball.

Overall Fantasy Value Score (Example)¶

In [13]:

# Create a simple fantasy value score based on weighted sum of common fantasy categories
# Weights can be adjusted based on fantasy league scoring and preferences
df['Fantasy_Score'] = (
    df['HR'] * 1.5 + 
    df['RBI'] * 1.0 + 
    df['R'] * 1.0 + 
    df['SB'] * 2.0 + 
    df['AVG'] * 100 # Scale AVG for impact
)

print("\nTop 10 Players by Custom Fantasy Score:")
print(df.sort_values(by='Fantasy_Score', ascending=False).head(10)[['Name', 'Team', 'Fantasy_Score', 'HR', 'SB', 'AVG', 'RBI', 'R', 'ADP']])

print("\nThis custom score provides a consolidated view of player value based on common fantasy categories.")

Top 10 Players by Custom Fantasy Score:
                   Name Team  Fantasy_Score  HR  SB    AVG  RBI   R    ADP
1         Shohei Ohtani  LAD          177.5  18  13  0.285   47  49    1.5
0           Aaron Judge  NYY          154.8  19   3  0.303   47  43    3.1
10      Elly De La Cruz  CIN          152.0  10  18  0.270   34  40    4.3
2        Bobby Witt Jr.  KCR          140.9  10  12  0.289   36  37    1.6
5          José Ramírez  CLE          140.2  11  12  0.277   36  36    4.9
18       Corbin Carroll  ARI          131.2   9  11  0.257   31  39    7.9
13  Pete Crow-Armstrong  CHC          130.4   9  13  0.259   32  33  127.4
4             Juan Soto  NYM          130.3  13   4  0.268   36  40    7.1
7       Julio Rodríguez  SEA          130.1  10  10  0.271   33  35   13.7
38      Jackson Chourio  MIL          128.2   8  11  0.272   34  33   16.4

This custom score provides a consolidated view of player value based on common fantasy categories.

Conclusion¶

This analysis provides a comprehensive overview of the provided fantasy baseball player data. We identified top performers across various categories, highlighted potential sleepers and overvalued players based on their current performance relative to their ADP, and pinpointed anomalies that might indicate future regression or progression. The visualizations offer a quick understanding of data distributions and relationships between key metrics. Further analysis could involve more sophisticated predictive modeling, time-series analysis for player trends, or clustering to group similar player profiles.