NYPD Arrest Data Analysis 2024¶
This Jupyter Notebook performs an exploratory data analysis of the NYPD Arrest Data for the year 2024. The analysis aims to provide insights into arrest patterns, including the types of offenses, demographic distributions of individuals arrested, and geographical aspects of arrests. The notebook will load the data, visualize key trends, highlight potential anomalies within the dataset, and discuss the feasibility of making predictions based on this data. Due to the limitations of year-to-date data, in-depth predictive modeling is not within the scope of this analysis, but descriptive insights and potential areas for further investigation will be explored.
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Load Data¶
# Load the NYPD arrest data CSV file into a pandas DataFrame
try:
arrest_data = pd.read_csv('nypd_arrest_data__year_to_date_.csv')
print("Data loaded successfully!")
except FileNotFoundError:
print("Error: CSV file not found. Please ensure 'nypd_arrest_data__year_to_date_.csv' is in the correct directory.")
# To proceed without the actual file for demonstration purpose, create a dummy DataFrame
# Example: arrest_data = pd.DataFrame({'OFNS_DESC': [], 'AGE_GROUP': [], 'PERP_SEX': [], 'PERP_RACE': [], 'ARREST_BORO': []})
# Consider loading a sample if available or using StringIO for minimal example data
arrest_data = None # Set to None to indicate data loading failure
if arrest_data is not None:
print(arrest_data.head())
Data loaded successfully! ARREST_KEY ARREST_DATE PD_CD PD_DESC KY_CD OFNS_DESC \ 0 281369711 01/30/2024 177.0 SEXUAL ABUSE 116.0 SEX CRIMES 1 284561406 03/30/2024 105.0 STRANGULATION 1ST 106.0 FELONY ASSAULT 2 284896016 04/06/2024 105.0 STRANGULATION 1ST 106.0 FELONY ASSAULT 3 285569016 04/18/2024 105.0 STRANGULATION 1ST 106.0 FELONY ASSAULT 4 287308954 05/22/2024 464.0 JOSTLING 230.0 JOSTLING LAW_CODE LAW_CAT_CD ARREST_BORO ARREST_PRECINCT JURISDICTION_CODE \ 0 PL 1306501 F M 25 0 1 PL 1211200 F B 44 0 2 PL 1211200 F M 19 0 3 PL 1211200 F K 69 0 4 PL 1652501 M M 18 0 AGE_GROUP PERP_SEX PERP_RACE X_COORD_CD Y_COORD_CD Latitude Longitude \ 0 25-44 M BLACK 1000558 231080 40.800930 -73.941098 1 25-44 M BLACK 1004297 242846 40.833209 -73.927554 2 25-44 M BLACK 997304 222853 40.778348 -73.952863 3 25-44 M BLACK 1010576 175628 40.648698 -73.905128 4 18-24 M WHITE 991530 217373 40.763313 -73.973717 New Georeferenced Column 0 POINT (-73.9410982410066 40.8009303727402) 1 POINT (-73.927554 40.833209) 2 POINT (-73.952863 40.778348) 3 POINT (-73.905128 40.648698) 4 POINT (-73.973717 40.763313)
Most Frequent Offense Descriptions¶
if arrest_data is not None:
offense_counts = arrest_data['OFNS_DESC'].value_counts().nlargest(10)
plt.figure(figsize=(10, 6))
sns.barplot(x=offense_counts.index, y=offense_counts.values, palette='viridis')
plt.title('Top 10 Most Frequent Offense Descriptions')
plt.xlabel('Offense Description')
plt.ylabel('Number of Arrests')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
else:
print("Data not loaded, skipping visualization.")
/tmp/ipykernel_376/1646610509.py:4: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(x=offense_counts.index, y=offense_counts.values, palette='viridis')
Arrests by Age Group¶
if arrest_data is not None:
age_group_counts = arrest_data['AGE_GROUP'].value_counts()
plt.figure(figsize=(8, 5))
sns.barplot(x=age_group_counts.index, y=age_group_counts.values, palette='magma')
plt.title('Distribution of Arrests by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Number of Arrests')
plt.show()
else:
print("Data not loaded, skipping visualization.")
/tmp/ipykernel_376/3255771616.py:4: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(x=age_group_counts.index, y=age_group_counts.values, palette='magma')
Arrests by Perp Sex¶
if arrest_data is not None:
perp_sex_counts = arrest_data['PERP_SEX'].value_counts()
plt.figure(figsize=(6, 6))
plt.pie(perp_sex_counts, labels=perp_sex_counts.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel'))
plt.title('Distribution of Arrests by Perpetrator Sex')
plt.ylabel('PERP_SEX')
plt.show()
else:
print("Data not loaded, skipping visualization.")
Arrests by Perpetrator Race¶
if arrest_data is not None:
perp_race_counts = arrest_data['PERP_RACE'].value_counts().nlargest(10)
plt.figure(figsize=(10, 6))
sns.barplot(x=perp_race_counts.index, y=perp_race_counts.values, palette='muted')
plt.title('Top 10 Arrests by Perpetrator Race')
plt.xlabel('Perpetrator Race')
plt.ylabel('Number of Arrests')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
else:
print("Data not loaded, skipping visualization.")
/tmp/ipykernel_376/1888309820.py:4: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(x=perp_race_counts.index, y=perp_race_counts.values, palette='muted')
Arrests by Borough¶
if arrest_data is not None:
arrest_boro_counts = arrest_data['ARREST_BORO'].value_counts()
plt.figure(figsize=(8, 5))
sns.barplot(x=arrest_boro_counts.index, y=arrest_boro_counts.values, palette='coolwarm')
plt.title('Distribution of Arrests by Borough')
plt.xlabel('Arrest Borough')
plt.ylabel('Number of Arrests')
plt.show()
else:
print("Data not loaded, skipping visualization.")
/tmp/ipykernel_376/2931380263.py:4: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(x=arrest_boro_counts.index, y=arrest_boro_counts.values, palette='coolwarm')
Anomalies in Data¶
if arrest_data is not None:
print("\nExamples of rows with missing descriptions (PD_DESC & OFNS_DESC):")
null_desc_rows = arrest_data[arrest_data['PD_DESC'].isnull() | arrest_data['OFNS_DESC'].isnull()]
if not null_desc_rows.empty:
print(null_desc_rows[['PD_CD', 'PD_DESC', 'KY_CD', 'OFNS_DESC']].head())
else:
print("No rows with missing descriptions found in the displayed sample.\n")
print("\nExamples of rows with zero coordinates:")
zero_coord_rows = arrest_data[(arrest_data['X_COORD_CD'] == 0) | (arrest_data['Y_COORD_CD'] == 0)]
if not zero_coord_rows.empty:
print(zero_coord_rows[['ARREST_PRECINCT', 'X_COORD_CD', 'Y_COORD_CD', 'Latitude', 'Longitude']].head())
else:
print("No rows with zero coordinates found in the displayed sample.\n")
else:
print("Data not loaded, skipping anomaly check.")
Examples of rows with missing descriptions (PD_DESC & OFNS_DESC): No rows with missing descriptions found in the displayed sample. Examples of rows with zero coordinates: ARREST_PRECINCT X_COORD_CD Y_COORD_CD Latitude Longitude 21 25 0 0 0.0 0.0 37 113 0 0 0.0 0.0 92561 1 0 0 0.0 0.0 163491 44 0 0 0.0 0.0 166939 113 0 0 0.0 0.0
Predictions and Predictive Analysis Feasibility¶
print("Predictive analysis on this year-to-date data alone is limited.")
print("More historical data and features would be needed for robust predictions.")
# No predictive models are implemented in this notebook due to data limitations.
Predictive analysis on this year-to-date data alone is limited. More historical data and features would be needed for robust predictions.
Summary¶
This analysis provided a preliminary overview of the NYPD Arrest Data for the year-to-date. Key insights include the prevalence of assault-related offenses, larceny, drug offenses, and traffic violations. The demographic analysis indicates arrests are concentrated among the '25-44' age group and males, with Black and White Hispanic individuals significantly represented. Anomalies such as missing offense descriptions and location data were noted. Predictive analysis is deemed unfeasible with the current dataset, highlighting the need for richer, historical data for future predictive modeling efforts.