Fantasy Baseball Player Analysis¶
This Jupyter Notebook performs an exploratory data analysis on fantasy baseball player statistics. The primary purpose is to identify top-performing players, potential sleepers, and anomalies based on various offensive metrics. We will visualize key statistics, analyze correlations, and derive insights that could be valuable for fantasy baseball drafting or trade decisions.
1. Import Necessary Libraries¶
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
# Ignore all warnings for cleaner output
warnings.filterwarnings('ignore')
2. Load CSV Data¶
try:
df = pd.read_csv('fantasy_bbl.csv')
print("CSV data loaded successfully.")
except FileNotFoundError:
print("Error: 'fantasy_bbl.csv' not found. Please ensure the file is in the correct directory.")
# Create a dummy DataFrame for demonstration if the file is not found
# This dummy data is based on the provided CSV content for consistency
data = {
'#': [1, 2, 3, 4, 5, 13, 126, 436],
'Name': ['Aaron Judge', 'Shohei Ohtani', 'Bobby Witt Jr.', 'Cal Raleigh', 'Juan Soto', 'Ronald Acuña Jr.', 'Mike Trout', 'Austin Hedges'],
'Team': ['NYY', 'LAD', 'KCR', 'SEA', 'NYM', 'ATL', 'LAA', 'CLE'],
'G': [56, 56, 56, 53, 56, 41, 34, 30],
'PA': [245, 253, 244, 214, 246, 182, 143, 83],
'AB': [200, 217, 223, 186, 195, 154, 120, 74],
'H': [61, 62, 64, 44, 52, 46, 30, 13],
'2B': [11, 10, 15, 8, 9, 8, 5, 2],
'3B': [0, 3, 3, 0, 0, 1, 1, 0],
'HR': [19, 18, 10, 14, 13, 9, 8, 2],
'R': [43, 49, 37, 29, 40, 36, 20, 7],
'RBI': [47, 47, 36, 36, 36, 24, 21, 6],
'BB': [41, 32, 17, 25, 48, 25, 20, 6],
'SO': [65, 60, 43, 56, 42, 38, 39, 24],
'HBP': [2, 2, 2, 2, 1, 2, 2, 1],
'SB': [3, 13, 12, 3, 4, 10, 1, 1],
'CS': [1, 2, 4, 1, 1, 2, 0, 0],
'BB%': ['16.80%', '12.70%', '7.10%', '11.50%', '19.70%', '13.60%', '14.10%', '6.90%'],
'K%': ['26.40%', '23.90%', '17.70%', '26.20%', '17.00%', '20.70%', '27.50%', '29.40%'],
'ISO': [0.336, 0.318, 0.221, 0.278, 0.244, 0.242, 0.248, 0.099],
'BABIP': [0.355, 0.313, 0.318, 0.256, 0.279, 0.343, 0.301, 0.226],
'AVG': [0.303, 0.285, 0.289, 0.238, 0.268, 0.300, 0.251, 0.172],
'OBP': [0.423, 0.379, 0.343, 0.33, 0.413, 0.403, 0.365, 0.239],
'SLG': [0.639, 0.603, 0.51, 0.517, 0.512, 0.542, 0.500, 0.272],
'OPS': [1.062, 0.981, 0.853, 0.847, 0.925, 0.945, 0.865, 0.511],
'wOBA': [0.439, 0.410, 0.362, 0.358, 0.397, 0.406, 0.368, 0.231],
'wRC+': [187, 167, 130, 138, 162, 163, 136, 45],
'ADP': [3.1, 1.5, 1.6, 85.6, 7.1, 37.6, 111.9, 748.0]
}
df = pd.DataFrame(data)
print("Using dummy data for demonstration.")
CSV data loaded successfully.
3. Initial Data Exploration and Cleaning¶
print("\nDataFrame Head:")
print(df.head())
print("\nDataFrame Info (before cleaning):")
df.info()
print("\nDataFrame Description:")
print(df.describe())
# Clean percentage columns (BB%, K%) by removing '%' and converting to float
for col in ['BB%', 'K%']:
if col in df.columns:
df[col] = df[col].astype(str).str.replace('%', '').astype(float) / 100
# Handle 'ADP' values of 999 (likely means undrafted or very late). Replace with NaN for better statistical handling.
df['ADP'] = df['ADP'].replace(999, np.nan)
print("\nData types after cleaning:")
df.info()
DataFrame Head: # Name Team G PA AB H 2B 3B HR ... K% ISO \ 0 1 Aaron Judge NYY 56 245 200 61 11 0 19 ... 26.40% 0.336 1 2 Shohei Ohtani LAD 56 253 217 62 10 3 18 ... 23.90% 0.318 2 3 Bobby Witt Jr. KCR 56 244 223 64 15 3 10 ... 17.70% 0.221 3 4 Cal Raleigh SEA 53 214 186 44 8 0 14 ... 26.20% 0.278 4 5 Juan Soto NYM 56 246 195 52 9 0 13 ... 17.00% 0.244 BABIP AVG OBP SLG OPS wOBA wRC+ ADP 0 0.355 0.303 0.423 0.639 1.062 0.439 187 3.1 1 0.313 0.285 0.379 0.603 0.981 0.410 167 1.5 2 0.318 0.289 0.343 0.510 0.853 0.362 130 1.6 3 0.256 0.238 0.330 0.517 0.847 0.358 138 85.6 4 0.279 0.268 0.413 0.512 0.925 0.397 162 7.1 [5 rows x 28 columns] DataFrame Info (before cleaning): <class 'pandas.core.frame.DataFrame'> RangeIndex: 606 entries, 0 to 605 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 606 non-null int64 1 Name 606 non-null object 2 Team 590 non-null object 3 G 606 non-null int64 4 PA 606 non-null int64 5 AB 606 non-null int64 6 H 606 non-null int64 7 2B 606 non-null int64 8 3B 606 non-null int64 9 HR 606 non-null int64 10 R 606 non-null int64 11 RBI 606 non-null int64 12 BB 606 non-null int64 13 SO 606 non-null int64 14 HBP 606 non-null int64 15 SB 606 non-null int64 16 CS 606 non-null int64 17 BB% 606 non-null object 18 K% 606 non-null object 19 ISO 606 non-null float64 20 BABIP 606 non-null float64 21 AVG 606 non-null float64 22 OBP 606 non-null float64 23 SLG 606 non-null float64 24 OPS 606 non-null float64 25 wOBA 606 non-null float64 26 wRC+ 606 non-null int64 27 ADP 606 non-null float64 dtypes: float64(8), int64(16), object(4) memory usage: 132.7+ KB DataFrame Description: # G PA AB H 2B \ count 606.00000 606.000000 606.000000 606.000000 606.000000 606.000000 mean 303.50000 39.326733 150.749175 134.881188 32.976898 6.529703 std 175.08141 9.611057 45.820055 40.798790 11.836886 2.489305 min 1.00000 12.000000 41.000000 36.000000 8.000000 1.000000 25% 152.25000 32.000000 117.000000 105.000000 24.000000 5.000000 50% 303.50000 39.000000 145.000000 130.000000 31.000000 6.000000 75% 454.75000 47.000000 184.750000 165.000000 41.000000 8.000000 max 606.00000 58.000000 253.000000 228.000000 67.000000 15.000000 3B HR R RBI ... CS \ count 606.000000 606.000000 606.000000 606.000000 ... 606.000000 mean 0.605611 4.297030 18.110561 17.693069 ... 0.831683 std 0.671425 2.751569 7.032110 7.247788 ... 0.811927 min 0.000000 0.000000 4.000000 4.000000 ... 0.000000 25% 0.000000 2.000000 13.000000 13.000000 ... 0.000000 50% 1.000000 4.000000 17.000000 16.000000 ... 1.000000 75% 1.000000 6.000000 22.000000 22.000000 ... 1.000000 max 5.000000 19.000000 49.000000 47.000000 ... 4.000000 ISO BABIP AVG OBP SLG OPS \ count 606.000000 606.000000 606.000000 606.000000 606.000000 606.000000 mean 0.148525 0.294822 0.240870 0.312190 0.389408 0.701594 std 0.040727 0.022822 0.021165 0.024806 0.050077 0.067495 min 0.046000 0.217000 0.171000 0.232000 0.241000 0.509000 25% 0.121000 0.280000 0.227000 0.297000 0.358500 0.659000 50% 0.144000 0.295000 0.240000 0.311000 0.385000 0.697500 75% 0.174000 0.310000 0.255000 0.327000 0.416000 0.738000 max 0.336000 0.374000 0.303000 0.423000 0.639000 1.062000 wOBA wRC+ ADP count 606.000000 606.000000 606.000000 mean 0.308239 96.064356 543.642904 std 0.026186 18.294311 323.721651 min 0.231000 42.000000 1.500000 25% 0.292000 84.000000 253.625000 50% 0.307000 95.000000 557.350000 75% 0.323750 106.000000 750.000000 max 0.439000 187.000000 999.000000 [8 rows x 24 columns] Data types after cleaning: <class 'pandas.core.frame.DataFrame'> RangeIndex: 606 entries, 0 to 605 Data columns (total 28 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 606 non-null int64 1 Name 606 non-null object 2 Team 590 non-null object 3 G 606 non-null int64 4 PA 606 non-null int64 5 AB 606 non-null int64 6 H 606 non-null int64 7 2B 606 non-null int64 8 3B 606 non-null int64 9 HR 606 non-null int64 10 R 606 non-null int64 11 RBI 606 non-null int64 12 BB 606 non-null int64 13 SO 606 non-null int64 14 HBP 606 non-null int64 15 SB 606 non-null int64 16 CS 606 non-null int64 17 BB% 606 non-null float64 18 K% 606 non-null float64 19 ISO 606 non-null float64 20 BABIP 606 non-null float64 21 AVG 606 non-null float64 22 OBP 606 non-null float64 23 SLG 606 non-null float64 24 OPS 606 non-null float64 25 wOBA 606 non-null float64 26 wRC+ 606 non-null int64 27 ADP 475 non-null float64 dtypes: float64(10), int64(16), object(2) memory usage: 132.7+ KB
4. Key Performance Indicators (KPIs) Analysis¶
print("\nTop 10 Players by OPS:")
print(df.sort_values(by='OPS', ascending=False).head(10)[['Name', 'Team', 'OPS', 'ADP']])
print("\nTop 10 Players by wOBA:")
print(df.sort_values(by='wOBA', ascending=False).head(10)[['Name', 'Team', 'wOBA', 'ADP']])
print("\nTop 10 Players by wRC+:")
print(df.sort_values(by='wRC+', ascending=False).head(10)[['Name', 'Team', 'wRC+', 'ADP']])
print("\nTop 10 Home Run Hitters:")
print(df.sort_values(by='HR', ascending=False).head(10)[['Name', 'Team', 'HR', 'ADP']])
print("\nTop 10 Stolen Base Threats:")
print(df.sort_values(by='SB', ascending=False).head(10)[['Name', 'Team', 'SB', 'ADP']])
print("\nTop 10 Players by Batting Average (min 100 PA):")
print(df[df['PA'] >= 100].sort_values(by='AVG', ascending=False).head(10)[['Name', 'Team', 'AVG', 'ADP']])
Top 10 Players by OPS: Name Team OPS ADP 0 Aaron Judge NYY 1.062 3.1 1 Shohei Ohtani LAD 0.981 1.5 12 Ronald Acuña Jr. ATL 0.945 37.6 4 Juan Soto NYM 0.925 7.1 46 Yordan Alvarez HOU 0.919 17.1 36 Bryce Harper PHI 0.890 22.3 57 Brent Rooker ATH 0.874 62.2 14 Corey Seager TEX 0.869 46.4 19 Ketel Marte ARI 0.866 30.7 125 Mike Trout LAA 0.865 111.9 Top 10 Players by wOBA: Name Team wOBA ADP 0 Aaron Judge NYY 0.439 3.1 1 Shohei Ohtani LAD 0.410 1.5 12 Ronald Acuña Jr. ATL 0.406 37.6 4 Juan Soto NYM 0.397 7.1 46 Yordan Alvarez HOU 0.388 17.1 36 Bryce Harper PHI 0.380 22.3 57 Brent Rooker ATH 0.373 62.2 19 Ketel Marte ARI 0.371 30.7 14 Corey Seager TEX 0.371 46.4 31 Freddie Freeman LAD 0.371 26.9 Top 10 Players by wRC+: Name Team wRC+ ADP 0 Aaron Judge NYY 187 3.1 1 Shohei Ohtani LAD 167 1.5 12 Ronald Acuña Jr. ATL 163 37.6 4 Juan Soto NYM 162 7.1 46 Yordan Alvarez HOU 152 17.1 36 Bryce Harper PHI 144 22.3 14 Corey Seager TEX 141 46.4 31 Freddie Freeman LAD 140 26.9 25 Vladimir Guerrero Jr. TOR 139 13.2 19 Ketel Marte ARI 139 30.7 Top 10 Home Run Hitters: Name Team HR ADP 0 Aaron Judge NYY 19 3.1 1 Shohei Ohtani LAD 18 1.5 60 Kyle Schwarber PHI 15 75.9 3 Cal Raleigh SEA 14 85.6 4 Juan Soto NYM 13 7.1 57 Brent Rooker ATH 13 62.2 47 Eugenio Suárez ARI 12 168.7 45 Pete Alonso NYM 12 47.6 35 Matt Olson ATL 11 34.0 5 José Ramírez CLE 11 4.9 Top 10 Stolen Base Threats: Name Team SB ADP 243 Chandler Simpson TBR 19 NaN 10 Elly De La Cruz CIN 18 4.3 244 José Caballero TBR 15 327.1 1 Shohei Ohtani LAD 13 1.5 13 Pete Crow-Armstrong CHC 13 127.4 170 Victor Scott II STL 13 321.9 84 Brice Turang MIL 12 145.1 5 José Ramírez CLE 12 4.9 2 Bobby Witt Jr. KCR 12 1.6 63 CJ Abrams WSN 11 49.6 Top 10 Players by Batting Average (min 100 PA): Name Team AVG ADP 0 Aaron Judge NYY 0.303 3.1 192 Luis Arraez SDP 0.301 185.1 110 Jacob Wilson ATH 0.300 327.1 12 Ronald Acuña Jr. ATL 0.300 37.6 243 Chandler Simpson TBR 0.296 NaN 31 Freddie Freeman LAD 0.296 26.9 102 Xavier Edwards MIA 0.292 139.8 25 Vladimir Guerrero Jr. TOR 0.289 13.2 2 Bobby Witt Jr. KCR 0.289 1.6 142 Yandy Díaz TBR 0.287 195.7
5. Data Visualization¶
Distribution of Key Offensive Stats¶
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('Distribution of Key Offensive Statistics', fontsize=16)
sns.histplot(df['HR'], bins=10, kde=True, ax=axes[0, 0])
axes[0, 0].set_title('Home Runs (HR)')
sns.histplot(df['SB'], bins=10, kde=True, ax=axes[0, 1])
axes[0, 1].set_title('Stolen Bases (SB)')
sns.histplot(df['AVG'], bins=10, kde=True, ax=axes[0, 2])
axes[0, 2].set_title('Batting Average (AVG)')
sns.histplot(df['OPS'], bins=10, kde=True, ax=axes[1, 0])
axes[1, 0].set_title('On-base Plus Slugging (OPS)')
sns.histplot(df['wOBA'], bins=10, kde=True, ax=axes[1, 1])
axes[1, 1].set_title('Weighted On-Base Average (wOBA)')
# Filter out NaN ADP values for plotting to avoid errors
sns.histplot(df['ADP'].dropna(), bins=20, kde=True, ax=axes[1, 2])
axes[1, 2].set_title('Average Draft Position (ADP)')
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
Performance vs. ADP¶
# Filter out players with NaN ADP for this plot to avoid errors
df_filtered_adp = df.dropna(subset=['ADP']).copy()
plt.figure(figsize=(12, 8))
sns.scatterplot(data=df_filtered_adp, x='ADP', y='wOBA', hue='Team', size='PA', sizes=(20, 400), alpha=0.7)
plt.title('wOBA vs. Average Draft Position (ADP)', fontsize=16)
plt.xlabel('Average Draft Position (Lower is Better)', fontsize=12)
plt.ylabel('Weighted On-Base Average (wOBA)', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.6)
plt.gca().invert_xaxis() # Lower ADP is better, so invert x-axis
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.show()
Correlation Heatmap of Offensive Stats¶
correlation_cols = ['G', 'PA', 'AB', 'H', '2B', '3B', 'HR', 'R', 'RBI', 'BB', 'SO', 'HBP', 'SB', 'CS', 'BB%', 'K%', 'ISO', 'BABIP', 'AVG', 'OBP', 'SLG', 'OPS', 'wOBA', 'wRC+', 'ADP']
plt.figure(figsize=(16, 12))
sns.heatmap(df[correlation_cols].corr(), annot=False, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Matrix of Offensive Statistics', fontsize=16)
plt.show()
6. Insights and Anomalies¶
Identifying Potential Sleepers (High Performance, Low ADP)¶
# Define thresholds for 'high performance' and 'low ADP'
# For demonstration, let's consider players in the top 75th percentile for wOBA and bottom 50th percentile for ADP (higher ADP number means later draft)
woba_threshold = df_filtered_adp['wOBA'].quantile(0.75)
adp_threshold = df_filtered_adp['ADP'].quantile(0.50)
sleepers = df_filtered_adp[(df_filtered_adp['wOBA'] >= woba_threshold) & (df_filtered_adp['ADP'] >= adp_threshold)]
print(f"\nPotential Sleepers (wOBA >= {woba_threshold:.3f} and ADP >= {adp_threshold:.1f}):")
print(sleepers[['Name', 'Team', 'wOBA', 'ADP', 'HR', 'SB', 'AVG']].sort_values(by='wOBA', ascending=False).head(10))
print("\nThese players are performing well relative to their average draft position, indicating potential value.")
Potential Sleepers (wOBA >= 0.330 and ADP >= 417.1): Name Team wOBA ADP HR SB AVG 90 Kyle Stowers MIA 0.348 552.8 9 1 0.256 302 Rob Refsnyder BOS 0.343 735.4 3 1 0.263 344 Mickey Moniak COL 0.337 590.3 7 3 0.259 384 Masataka Yoshida BOS 0.336 487.2 3 1 0.282 211 Mike Tauchman CHW 0.335 519.6 3 1 0.256 409 Tyler Freeman COL 0.335 603.0 2 5 0.276 328 Romy Gonzalez BOS 0.334 672.9 3 4 0.270 501 Keston Hiura COL 0.334 749.6 4 1 0.250 54 Geraldo Perdomo ARI 0.333 430.3 4 6 0.262 79 Trent Grisham NYY 0.333 728.7 7 2 0.227 These players are performing well relative to their average draft position, indicating potential value.
Identifying Overvalued Players (Low Performance, High ADP)¶
# Define thresholds for 'low performance' and 'high ADP'
# For demonstration, let's consider players in the bottom 25th percentile for wOBA and top 25th percentile for ADP (lower ADP number means earlier draft)
woba_low_threshold = df_filtered_adp['wOBA'].quantile(0.25)
adp_high_threshold = df_filtered_adp['ADP'].quantile(0.25)
overvalued = df_filtered_adp[(df_filtered_adp['wOBA'] <= woba_low_threshold) & (df_filtered_adp['ADP'] <= adp_high_threshold)]
print(f"\nPotentially Overvalued Players (wOBA <= {woba_low_threshold:.3f} and ADP <= {adp_high_threshold:.1f}):")
print(overvalued[['Name', 'Team', 'wOBA', 'ADP', 'HR', 'SB', 'AVG']].sort_values(by='ADP', ascending=True).head(10))
print("\nThese players are performing below expectations relative to their average draft position, indicating they might be overvalued.")
Potentially Overvalued Players (wOBA <= 0.296 and ADP <= 200.1): Name Team wOBA ADP HR SB AVG 351 Luis Rengifo LAA 0.294 177.6 4 4 0.251 These players are performing below expectations relative to their average draft position, indicating they might be overvalued.
Anomalies: High K% / Low BB% Players¶
# Players with very high K% (e.g., top 10%) and low BB% (e.g., bottom 10%)
high_k_threshold = df['K%'].quantile(0.90)
low_bb_threshold = df['BB%'].quantile(0.10)
anomalies_k_bb = df[(df['K%'] >= high_k_threshold) & (df['BB%'] <= low_bb_threshold)]
print(f"\nPlayers with High K% ({high_k_threshold:.2%}) and Low BB% ({low_bb_threshold:.2%}):")
print(anomalies_k_bb[['Name', 'Team', 'K%', 'BB%', 'OPS']].sort_values(by='K%', ascending=False).head(10))
print("\nThese players might be prone to slumps due to poor plate discipline, or they are high-risk, high-reward power hitters.")
Players with High K% (30.40%) and Low BB% (5.60%): Name Team K% BB% OPS 589 Aramis Garcia ARI 0.373 0.031 0.519 555 Tim Elko CHW 0.353 0.053 0.673 These players might be prone to slumps due to poor plate discipline, or they are high-risk, high-reward power hitters.
Anomalies: BABIP Outliers (Potential Regression/Progression Candidates)¶
# Players with unusually high BABIP (e.g., top 5%) - potential regression candidates
high_babip_threshold = df['BABIP'].quantile(0.95)
high_babip_players = df[df['BABIP'] >= high_babip_threshold]
print(f"\nPlayers with Unusually High BABIP ({high_babip_threshold:.3f}) - Potential Regression Candidates:")
print(high_babip_players[['Name', 'Team', 'BABIP', 'AVG', 'OPS']].sort_values(by='BABIP', ascending=False).head(10))
# Players with unusually low BABIP (e.g., bottom 5%) - potential positive regression candidates
low_babip_threshold = df['BABIP'].quantile(0.05)
low_babip_players = df[df['BABIP'] <= low_babip_threshold]
print(f"\nPlayers with Unusually Low BABIP ({low_babip_threshold:.3f}) - Potential Positive Regression Candidates:")
print(low_babip_players[['Name', 'Team', 'BABIP', 'AVG', 'OPS']].sort_values(by='BABIP', ascending=True).head(10))
print("\nBABIP can indicate luck. High BABIP might mean a player is due for a batting average dip, while low BABIP might mean they are due for a rise.")
Players with Unusually High BABIP (0.333) - Potential Regression Candidates: Name Team BABIP AVG OPS 408 Greg Jones HOU 0.374 0.218 0.632 96 Jonathan Aranda TBR 0.359 0.282 0.807 0 Aaron Judge NYY 0.355 0.303 1.062 470 Carson McCusker MIN 0.348 0.234 0.698 10 Elly De La Cruz CIN 0.346 0.270 0.826 462 Zach Dezenzo HOU 0.346 0.252 0.719 328 Romy Gonzalez BOS 0.344 0.270 0.783 12 Ronald Acuña Jr. ATL 0.343 0.300 0.945 33 Riley Greene DET 0.343 0.271 0.823 555 Tim Elko CHW 0.342 0.238 0.673 Players with Unusually Low BABIP (0.258) - Potential Positive Regression Candidates: Name Team BABIP AVG OPS 600 Jac Caglianone KCR 0.217 0.193 0.609 435 Austin Hedges CLE 0.226 0.172 0.511 601 Anthony Seigler MIL 0.238 0.194 0.606 568 Jason Heyward NaN 0.238 0.206 0.614 130 Danny Jansen TBR 0.242 0.221 0.737 131 Bo Naylor CLE 0.242 0.204 0.681 286 Carlos Santana CLE 0.243 0.221 0.682 466 Luis Matos SFG 0.245 0.236 0.668 590 Andrew Knizner SFG 0.245 0.205 0.568 314 JJ Bleday ATH 0.249 0.220 0.723 BABIP can indicate luck. High BABIP might mean a player is due for a batting average dip, while low BABIP might mean they are due for a rise.
7. Simple Prediction/Outlook¶
Identifying Players with High Power and Speed Potential¶
# Define thresholds for power (HR) and speed (SB) (e.g., top 20% for both)
hr_threshold = df['HR'].quantile(0.80)
sb_threshold = df['SB'].quantile(0.80)
power_speed_threats = df[(df['HR'] >= hr_threshold) & (df['SB'] >= sb_threshold)]
print(f"\nPlayers with High Power (HR >= {hr_threshold:.0f}) and High Speed (SB >= {sb_threshold:.0f}):")
print(power_speed_threats[['Name', 'Team', 'HR', 'SB', 'OPS', 'ADP']].sort_values(by=['HR', 'SB'], ascending=False).head(10))
print("\nThese players offer a valuable combination of power and speed, which is highly sought after in fantasy baseball.")
Players with High Power (HR >= 7) and High Speed (SB >= 5): Name Team HR SB OPS ADP 1 Shohei Ohtani LAD 18 13 0.981 1.5 5 José Ramírez CLE 11 12 0.853 4.9 10 Elly De La Cruz CIN 10 18 0.826 4.3 2 Bobby Witt Jr. KCR 10 12 0.853 1.6 7 Julio Rodríguez SEA 10 10 0.783 13.7 6 Francisco Lindor NYM 10 8 0.778 14.1 8 Fernando Tatis Jr. SDP 10 8 0.843 11.6 29 James Wood WSN 10 7 0.840 53.3 146 Adolis García TEX 10 5 0.726 151.5 13 Pete Crow-Armstrong CHC 9 13 0.777 127.4 These players offer a valuable combination of power and speed, which is highly sought after in fantasy baseball.
Overall Fantasy Value Score (Example)¶
# Create a simple fantasy value score based on weighted sum of common fantasy categories
# Weights can be adjusted based on fantasy league scoring and preferences
df['Fantasy_Score'] = (
df['HR'] * 1.5 +
df['RBI'] * 1.0 +
df['R'] * 1.0 +
df['SB'] * 2.0 +
df['AVG'] * 100 # Scale AVG for impact
)
print("\nTop 10 Players by Custom Fantasy Score:")
print(df.sort_values(by='Fantasy_Score', ascending=False).head(10)[['Name', 'Team', 'Fantasy_Score', 'HR', 'SB', 'AVG', 'RBI', 'R', 'ADP']])
print("\nThis custom score provides a consolidated view of player value based on common fantasy categories.")
Top 10 Players by Custom Fantasy Score: Name Team Fantasy_Score HR SB AVG RBI R ADP 1 Shohei Ohtani LAD 177.5 18 13 0.285 47 49 1.5 0 Aaron Judge NYY 154.8 19 3 0.303 47 43 3.1 10 Elly De La Cruz CIN 152.0 10 18 0.270 34 40 4.3 2 Bobby Witt Jr. KCR 140.9 10 12 0.289 36 37 1.6 5 José Ramírez CLE 140.2 11 12 0.277 36 36 4.9 18 Corbin Carroll ARI 131.2 9 11 0.257 31 39 7.9 13 Pete Crow-Armstrong CHC 130.4 9 13 0.259 32 33 127.4 4 Juan Soto NYM 130.3 13 4 0.268 36 40 7.1 7 Julio Rodríguez SEA 130.1 10 10 0.271 33 35 13.7 38 Jackson Chourio MIL 128.2 8 11 0.272 34 33 16.4 This custom score provides a consolidated view of player value based on common fantasy categories.
Conclusion¶
This analysis provides a comprehensive overview of the provided fantasy baseball player data. We identified top performers across various categories, highlighted potential sleepers and overvalued players based on their current performance relative to their ADP, and pinpointed anomalies that might indicate future regression or progression. The visualizations offer a quick understanding of data distributions and relationships between key metrics. Further analysis could involve more sophisticated predictive modeling, time-series analysis for player trends, or clustering to group similar player profiles.