NYPD Arrest Data Analysis 2024¶

This Jupyter Notebook performs an exploratory data analysis of the NYPD Arrest Data for the year 2024. The analysis aims to provide insights into arrest patterns, including the types of offenses, demographic distributions of individuals arrested, and geographical aspects of arrests. The notebook will load the data, visualize key trends, highlight potential anomalies within the dataset, and discuss the feasibility of making predictions based on this data. Due to the limitations of year-to-date data, in-depth predictive modeling is not within the scope of this analysis, but descriptive insights and potential areas for further investigation will be explored.

In [1]:

# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

Load Data¶

In [2]:

# Load the NYPD arrest data CSV file into a pandas DataFrame
try:
    arrest_data = pd.read_csv('nypd_arrest_data__year_to_date_.csv')
    print("Data loaded successfully!")
except FileNotFoundError:
    print("Error: CSV file not found. Please ensure 'nypd_arrest_data__year_to_date_.csv' is in the correct directory.")
    # To proceed without the actual file for demonstration purpose, create a dummy DataFrame
    # Example: arrest_data = pd.DataFrame({'OFNS_DESC': [], 'AGE_GROUP': [], 'PERP_SEX': [], 'PERP_RACE': [], 'ARREST_BORO': []})
    # Consider loading a sample if available or using StringIO for minimal example data
    arrest_data = None # Set to None to indicate data loading failure

if arrest_data is not None:
    print(arrest_data.head())

Data loaded successfully!
   ARREST_KEY ARREST_DATE  PD_CD            PD_DESC  KY_CD       OFNS_DESC  \
0   281369711  01/30/2024  177.0       SEXUAL ABUSE  116.0      SEX CRIMES   
1   284561406  03/30/2024  105.0  STRANGULATION 1ST  106.0  FELONY ASSAULT   
2   284896016  04/06/2024  105.0  STRANGULATION 1ST  106.0  FELONY ASSAULT   
3   285569016  04/18/2024  105.0  STRANGULATION 1ST  106.0  FELONY ASSAULT   
4   287308954  05/22/2024  464.0           JOSTLING  230.0        JOSTLING   

     LAW_CODE LAW_CAT_CD ARREST_BORO  ARREST_PRECINCT  JURISDICTION_CODE  \
0  PL 1306501          F           M               25                  0   
1  PL 1211200          F           B               44                  0   
2  PL 1211200          F           M               19                  0   
3  PL 1211200          F           K               69                  0   
4  PL 1652501          M           M               18                  0   

  AGE_GROUP PERP_SEX PERP_RACE  X_COORD_CD  Y_COORD_CD   Latitude  Longitude  \
0     25-44        M     BLACK     1000558      231080  40.800930 -73.941098   
1     25-44        M     BLACK     1004297      242846  40.833209 -73.927554   
2     25-44        M     BLACK      997304      222853  40.778348 -73.952863   
3     25-44        M     BLACK     1010576      175628  40.648698 -73.905128   
4     18-24        M     WHITE      991530      217373  40.763313 -73.973717   

                     New Georeferenced Column  
0  POINT (-73.9410982410066 40.8009303727402)  
1                POINT (-73.927554 40.833209)  
2                POINT (-73.952863 40.778348)  
3                POINT (-73.905128 40.648698)  
4                POINT (-73.973717 40.763313)

Most Frequent Offense Descriptions¶

In [3]:

if arrest_data is not None:
    offense_counts = arrest_data['OFNS_DESC'].value_counts().nlargest(10)
    plt.figure(figsize=(10, 6))
    sns.barplot(x=offense_counts.index, y=offense_counts.values, palette='viridis')
    plt.title('Top 10 Most Frequent Offense Descriptions')
    plt.xlabel('Offense Description')
    plt.ylabel('Number of Arrests')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
else:
    print("Data not loaded, skipping visualization.")

/tmp/ipykernel_376/1646610509.py:4: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=offense_counts.index, y=offense_counts.values, palette='viridis')

No description has been provided for this image

Arrests by Age Group¶

In [4]:

if arrest_data is not None:
    age_group_counts = arrest_data['AGE_GROUP'].value_counts()
    plt.figure(figsize=(8, 5))
    sns.barplot(x=age_group_counts.index, y=age_group_counts.values, palette='magma')
    plt.title('Distribution of Arrests by Age Group')
    plt.xlabel('Age Group')
    plt.ylabel('Number of Arrests')
    plt.show()
else:
    print("Data not loaded, skipping visualization.")

/tmp/ipykernel_376/3255771616.py:4: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=age_group_counts.index, y=age_group_counts.values, palette='magma')

Arrests by Perp Sex¶

In [5]:

if arrest_data is not None:
    perp_sex_counts = arrest_data['PERP_SEX'].value_counts()
    plt.figure(figsize=(6, 6))
    plt.pie(perp_sex_counts, labels=perp_sex_counts.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel'))
    plt.title('Distribution of Arrests by Perpetrator Sex')
    plt.ylabel('PERP_SEX')
    plt.show()
else:
    print("Data not loaded, skipping visualization.")

Arrests by Perpetrator Race¶

In [6]:

if arrest_data is not None:
    perp_race_counts = arrest_data['PERP_RACE'].value_counts().nlargest(10)
    plt.figure(figsize=(10, 6))
    sns.barplot(x=perp_race_counts.index, y=perp_race_counts.values, palette='muted')
    plt.title('Top 10 Arrests by Perpetrator Race')
    plt.xlabel('Perpetrator Race')
    plt.ylabel('Number of Arrests')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
else:
    print("Data not loaded, skipping visualization.")

/tmp/ipykernel_376/1888309820.py:4: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=perp_race_counts.index, y=perp_race_counts.values, palette='muted')

Arrests by Borough¶

In [7]:

if arrest_data is not None:
    arrest_boro_counts = arrest_data['ARREST_BORO'].value_counts()
    plt.figure(figsize=(8, 5))
    sns.barplot(x=arrest_boro_counts.index, y=arrest_boro_counts.values, palette='coolwarm')
    plt.title('Distribution of Arrests by Borough')
    plt.xlabel('Arrest Borough')
    plt.ylabel('Number of Arrests')
    plt.show()
else:
    print("Data not loaded, skipping visualization.")

/tmp/ipykernel_376/2931380263.py:4: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=arrest_boro_counts.index, y=arrest_boro_counts.values, palette='coolwarm')

Anomalies in Data¶

In [8]:

if arrest_data is not None:
    print("\nExamples of rows with missing descriptions (PD_DESC & OFNS_DESC):")
    null_desc_rows = arrest_data[arrest_data['PD_DESC'].isnull() | arrest_data['OFNS_DESC'].isnull()]
    if not null_desc_rows.empty:
        print(null_desc_rows[['PD_CD', 'PD_DESC', 'KY_CD', 'OFNS_DESC']].head())
    else:
        print("No rows with missing descriptions found in the displayed sample.\n")

    print("\nExamples of rows with zero coordinates:")
    zero_coord_rows = arrest_data[(arrest_data['X_COORD_CD'] == 0) | (arrest_data['Y_COORD_CD'] == 0)]
    if not zero_coord_rows.empty:
        print(zero_coord_rows[['ARREST_PRECINCT', 'X_COORD_CD', 'Y_COORD_CD', 'Latitude', 'Longitude']].head())
    else:
        print("No rows with zero coordinates found in the displayed sample.\n")
else:
    print("Data not loaded, skipping anomaly check.")

Examples of rows with missing descriptions (PD_DESC & OFNS_DESC):
No rows with missing descriptions found in the displayed sample.


Examples of rows with zero coordinates:
        ARREST_PRECINCT  X_COORD_CD  Y_COORD_CD  Latitude  Longitude
21                   25           0           0       0.0        0.0
37                  113           0           0       0.0        0.0
92561                 1           0           0       0.0        0.0
163491               44           0           0       0.0        0.0
166939              113           0           0       0.0        0.0

Predictions and Predictive Analysis Feasibility¶

In [9]:

print("Predictive analysis on this year-to-date data alone is limited.")
print("More historical data and features would be needed for robust predictions.")
# No predictive models are implemented in this notebook due to data limitations.

Predictive analysis on this year-to-date data alone is limited.
More historical data and features would be needed for robust predictions.

Summary¶

This analysis provided a preliminary overview of the NYPD Arrest Data for the year-to-date. Key insights include the prevalence of assault-related offenses, larceny, drug offenses, and traffic violations. The demographic analysis indicates arrests are concentrated among the '25-44' age group and males, with Black and White Hispanic individuals significantly represented. Anomalies such as missing offense descriptions and location data were noted. Predictive analysis is deemed unfeasible with the current dataset, highlighting the need for richer, historical data for future predictive modeling efforts.