Bus Stop Data Analysis - Královéhradecký kraj¶

This notebook performs an exploratory data analysis (EDA) on a dataset containing information about bus stops in the Královéhradecký kraj region of the Czech Republic. The analysis includes data loading, cleaning, visualization, and identification of key insights and potential anomalies. Due to lack of temporal or related data, predictive analysis is not applicable. Further analysis could focus on spatial distribution, clustering of stops, or relationships between bus stop locations and population density with external data.

In [1]:

# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import skew
import warnings
warnings.filterwarnings('ignore')

sns.set_style('darkgrid')

In [2]:

# Load the CSV data into a Pandas DataFrame
df = pd.read_csv('autobusové_zastávky_iredo_-1065211480630538261.csv', encoding='utf-8')

Initial Data Inspection¶

In [3]:

df.head()

Out[3]:

	Název	Označení	Název vyššího územního samosprávného celku	Kód vyššího územního samosprávného celku dle číselníku ČSÚ	Název správního obvodu obce s rozšířenou působností	Kód správního obvodu obce s rozšířenou působností dle číselníku ČSÚ	Název okresu	Kód okresu dle číselníku ČSÚ	Název obce	Kód obce dle číselníku ČSÚ	Zápis vektorové geometrie	Zeměpisná délka v souřadnicovém systému WGS84	Zeměpisná šířka v souřadnicovém systému WGS84	Jedinečný identifikátor v katalogu otevřených dat Data KHK	ID	x2	y2
0	Adršpach,Dolní Adršpach,zámek	38	Královéhradecký kraj	CZ052	Broumov	5201	Náchod	CZ0523	Adršpach	547786	POINT(50.618616 16.108505)	16.108505	50.618616	AZI1	1	1.793191e+06	6.554107e+06
1	Adršpach,Dolní Adršpach,zámek	38	Královéhradecký kraj	CZ052	Broumov	5201	Náchod	CZ0523	Adršpach	547786	POINT(50.618711 16.108444)	16.108444	50.618711	AZI2	2	1.793184e+06	6.554123e+06
2	Adršpach,Dolní Adršpach,odb.Zdoňov	39	Královéhradecký kraj	CZ052	Broumov	5201	Náchod	CZ0523	Adršpach	547786	POINT(50.616431 16.132747)	16.132747	50.616431	AZI3	3	1.795889e+06	6.553723e+06
3	Adršpach,Dolní Adršpach,odb.Zdoňov	39	Královéhradecký kraj	CZ052	Broumov	5201	Náchod	CZ0523	Adršpach	547786	POINT(50.616562 16.132786)	16.132786	50.616562	AZI4	4	1.795894e+06	6.553746e+06
4	Adršpach,Horní Adršpach,žel.zast.	40	Královéhradecký kraj	CZ052	Broumov	5201	Náchod	CZ0523	Adršpach	547786	POINT(50.624189 16.083921)	16.083921	50.624189	AZI5	5	1.790454e+06	6.555084e+06

Data Cleaning - Drop Duplicates¶

In [4]:

#remove duplicated entries
df = df.drop_duplicates()
df = df.reset_index(drop=True)

Data Cleaning - Clean the location column to extract town, part and name¶

In [5]:

# Split the 'Název' column into town, part, and specific location
df[['town', 'part', 'specific_location']] = df['Název'].str.split(',', expand=True)

# Fill missing 'part' and 'specific_location' values with empty strings
df['part'] = df['part'].fillna('')
df['specific_location'] = df['specific_location'].fillna('')

Visualize Bus Stop Distribution by Municipality¶

In [6]:

# Create a bar plot of the top municipalities with the most bus stops
plt.figure(figsize=(14, 6))
df['Název obce'].value_counts().nlargest(20).plot(kind='bar', color='skyblue')
plt.title('Top 20 Municipalities by Number of Bus Stops')
plt.xlabel('Municipality Name')
plt.ylabel('Number of Bus Stops')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

No description has been provided for this image

Bus Stop Density Heatmap¶

In [7]:

# Create a heatmap of bus stop density
plt.figure(figsize=(10, 8))
sns.kdeplot(x=df['Zeměpisná délka v souřadnicovém systému WGS84'], y=df['Zeměpisná šířka v souřadnicovém systému WGS84'], fill=True, cmap='viridis')
plt.title('Bus Stop Density Heatmap')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

Data Insight: Common Location Descriptors¶

In [8]:

# Display the 5 most common values from 'specific_location' 
print(df['specific_location'].value_counts().nlargest(5).to_frame())

                   count
specific_location       
                     768
ObÚ                  156
škola                125
aut.st.              119
odb.                  99

Summary of Findings¶

Based on the analysis performed on the provided dataset the following findings and summary were obtained:

Insights:

The dataset contains bus stop information primarily from the Královéhradecký kraj region of the Czech Republic.
Many bus stops are represented by two entries in the dataset, likely representing opposite directions of travel.
Several locations have multiple bus stops with slightly varying names and locations, indicating possible route variations or multiple platforms.
The coordinates x2 and y2 appear to be a transformation or alternate representation of the longitude and latitude.
The 'Název' field often contains location descriptors like 'ObÚ', 'žel.st.', 'nám.', suggesting important landmarks nearby the bus stops.

Anomalies:

Inconsistency in naming conventions within the 'Název' field (e.g., "Adršpach,Dolní Adršpach,zámek" vs "Nová Ves,,Kvartýr").
Some 'Název' values are missing a key descriptor (e.g., "Bašnice,559,..." where the second field is empty).
The dataset contains duplicated entries for the same bus stop (e.g. first two entries)
There's an AZI13 entry after AZI429.

Predictions: The dataset primarily describes bus stop locations and characteristics. There are no time-series data or other elements that lend themselves to time-based predictions. Simple predictive location analysis based on density isn't meaningful without external data (e.g. population, time of day).

Summary: This CSV data represents a catalog of bus stops within the Královéhradecký kraj, detailing their names, locations (latitude, longitude, and transformed x2/y2 coordinates), administrative region, and unique identifiers. Data quality issues exist in naming conventions. Due to a lack of temporal or related data, predictive analysis is not applicable. Further analysis could focus on spatial distribution, clustering of stops, or relationships between bus stop locations and population density with external data.