Czech Demographic Data Analysis¶
This notebook performs an analysis of Czech demographic data from the file oby01crgen.csv
. The analysis aims to visualize key demographic trends, identify anomalies, and gain insights into the population dynamics of Czechia. The primary focus will be on fertility rates, life expectancy, marriage patterns, and divorce rates.
In [1]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import linregress
%matplotlib inline
In [2]:
# Load Data
df = pd.read_csv('oby01crgen.csv')
df.head()
Out[2]:
Ukazatel | IndicatorType | Roky | CasR2A | Území | Uz0 | Hodnota | |
---|---|---|---|---|---|---|---|
0 | Úhrnná plodnost | 5405W | 2023 | 2023 | Česko | CZ | 1.452572 |
1 | Úhrnná plodnost | 5405W | 2022 | 2022 | Česko | CZ | 1.617747 |
2 | Úhrnná plodnost | 5405W | 2021 | 2021 | Česko | CZ | 1.826536 |
3 | Úhrnná plodnost | 5405W | 2020 | 2020 | Česko | CZ | 1.707373 |
4 | Úhrnná plodnost | 5405W | 2019 | 2019 | Česko | CZ | 1.708963 |
Total Fertility Rate Analysis¶
In [3]:
# Filter for Total Fertility Rate
fertility_rate = df[df['Ukazatel'] == 'Úhrnná plodnost']
# Convert 'Roky' to numeric
fertility_rate['Roky'] = pd.to_numeric(fertility_rate['Roky'])
# Sort by year
fertility_rate = fertility_rate.sort_values('Roky')
fertility_rate.head()
/tmp/ipykernel_380/3454315819.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy fertility_rate['Roky'] = pd.to_numeric(fertility_rate['Roky'])
Out[3]:
Ukazatel | IndicatorType | Roky | CasR2A | Území | Uz0 | Hodnota | |
---|---|---|---|---|---|---|---|
103 | Úhrnná plodnost | 5405W | 1920 | 1920 | Česko | CZ | 2.964 |
102 | Úhrnná plodnost | 5405W | 1921 | 1921 | Česko | CZ | 3.035 |
101 | Úhrnná plodnost | 5405W | 1922 | 1922 | Česko | CZ | 2.882 |
100 | Úhrnná plodnost | 5405W | 1923 | 1923 | Česko | CZ | 2.768 |
99 | Úhrnná plodnost | 5405W | 1924 | 1924 | Česko | CZ | 2.590 |
In [4]:
# Plot Total Fertility Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=fertility_rate)
plt.title('Total Fertility Rate in Czechia (Úhrnná plodnost)')
plt.xlabel('Year')
plt.ylabel('Fertility Rate')
plt.grid(True)
plt.show()
Average Age of Mothers at Childbirth Analysis¶
In [5]:
# Filter for Average Age of Mothers
avg_age_mothers = df[df['Ukazatel'] == 'Průměrný věk matek při narození dítěte']
# Convert 'Roky' to numeric
avg_age_mothers['Roky'] = pd.to_numeric(avg_age_mothers['Roky'])
# Sort by year
avg_age_mothers = avg_age_mothers.sort_values('Roky')
/tmp/ipykernel_380/3690507151.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy avg_age_mothers['Roky'] = pd.to_numeric(avg_age_mothers['Roky'])
In [6]:
# Plot Average Age of Mothers
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=avg_age_mothers)
plt.title('Average Age of Mothers at Childbirth in Czechia')
plt.xlabel('Year')
plt.ylabel('Average Age')
plt.grid(True)
plt.show()
Life Expectancy Analysis¶
In [7]:
# Filter for Life Expectancy
life_expectancy_men = df[df['Ukazatel'] == 'Naděje dožití mužů']
life_expectancy_women = df[df['Ukazatel'] == 'Naděje dožití žen']
# Convert 'Roky' to numeric
life_expectancy_men['Roky'] = pd.to_numeric(life_expectancy_men['Roky'])
life_expectancy_women['Roky'] = pd.to_numeric(life_expectancy_women['Roky'])
# Sort by year
life_expectancy_men = life_expectancy_men.sort_values('Roky')
life_expectancy_women = life_expectancy_women.sort_values('Roky')
/tmp/ipykernel_380/1364265047.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy life_expectancy_men['Roky'] = pd.to_numeric(life_expectancy_men['Roky']) /tmp/ipykernel_380/1364265047.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy life_expectancy_women['Roky'] = pd.to_numeric(life_expectancy_women['Roky'])
In [8]:
# Plot Life Expectancy
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=life_expectancy_men, label='Men')
sns.lineplot(x='Roky', y='Hodnota', data=life_expectancy_women, label='Women')
plt.title('Life Expectancy in Czechia')
plt.xlabel('Year')
plt.ylabel('Life Expectancy (Years)')
plt.grid(True)
plt.legend()
plt.show()
Divorce Rate Analysis¶
In [9]:
# Filter for Divorce Rate
divorce_rate = df[df['Ukazatel'] == 'Úhrnná rozvodovost (%)']
# Convert 'Roky' to numeric
divorce_rate['Roky'] = pd.to_numeric(divorce_rate['Roky'])
# Sort by year
divorce_rate = divorce_rate.sort_values('Roky')
/tmp/ipykernel_380/4274060893.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy divorce_rate['Roky'] = pd.to_numeric(divorce_rate['Roky'])
In [10]:
# Plot Divorce Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=divorce_rate)
plt.title('Crude Divorce Rate in Czechia')
plt.xlabel('Year')
plt.ylabel('Divorce Rate (%)')
plt.grid(True)
plt.show()
Average Duration of Marriage at Divorce¶
In [11]:
# Filter for marriage duration
marriage_duration = df[df['Ukazatel'] == 'Průměrná délka trvání manželství při rozvodu']
# Convert 'Roky' to numeric
marriage_duration['Roky'] = pd.to_numeric(marriage_duration['Roky'])
# Sort by year
marriage_duration = marriage_duration.sort_values('Roky')
/tmp/ipykernel_380/4006276042.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy marriage_duration['Roky'] = pd.to_numeric(marriage_duration['Roky'])
In [12]:
# Plot marriage duration
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=marriage_duration)
plt.title('Average duration of marriage at divorce')
plt.xlabel('Year')
plt.ylabel('Duration (years)')
plt.grid(True)
plt.show()
First Marriage Rate Analysis¶
In [13]:
# Filter for First Marriage Rate
first_marriage_men = df[df['Ukazatel'] == 'Tabulková prvosňatečnost mužů (%)']
first_marriage_women = df[df['Ukazatel'] == 'Tabulková prvosňatečnost žen (%)']
# Convert 'Roky' to numeric
first_marriage_men['Roky'] = pd.to_numeric(first_marriage_men['Roky'])
first_marriage_women['Roky'] = pd.to_numeric(first_marriage_women['Roky'])
# Sort by year
first_marriage_men = first_marriage_men.sort_values('Roky')
first_marriage_women = first_marriage_women.sort_values('Roky')
/tmp/ipykernel_380/3112156190.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy first_marriage_men['Roky'] = pd.to_numeric(first_marriage_men['Roky']) /tmp/ipykernel_380/3112156190.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy first_marriage_women['Roky'] = pd.to_numeric(first_marriage_women['Roky'])
In [14]:
# Plot First Marriage Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=first_marriage_men, label='Men')
sns.lineplot(x='Roky', y='Hodnota', data=first_marriage_women, label='Women')
plt.title('First Marriage Rate in Czechia')
plt.xlabel('Year')
plt.ylabel('First Marriage Rate (%)')
plt.grid(True)
plt.legend()
plt.show()
Average Age at First Marriage Analysis¶
In [15]:
# Filter for Average Age at First Marriage
age_first_marriage_men = df[df['Ukazatel'] == 'Průměrný věk mužů při prvním sňatku']
age_first_marriage_women = df[df['Ukazatel'] == 'Průměrný věk žen při prvním sňatku']
# Convert 'Roky' to numeric
age_first_marriage_men['Roky'] = pd.to_numeric(age_first_marriage_men['Roky'])
age_first_marriage_women['Roky'] = pd.to_numeric(age_first_marriage_women['Roky'])
# Sort by year
age_first_marriage_men = age_first_marriage_men.sort_values('Roky')
age_first_marriage_women = age_first_marriage_women.sort_values('Roky')
/tmp/ipykernel_380/2140980104.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy age_first_marriage_men['Roky'] = pd.to_numeric(age_first_marriage_men['Roky']) /tmp/ipykernel_380/2140980104.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy age_first_marriage_women['Roky'] = pd.to_numeric(age_first_marriage_women['Roky'])
In [16]:
# Plot Average Age at First Marriage
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=age_first_marriage_men, label='Men')
sns.lineplot(x='Roky', y='Hodnota', data=age_first_marriage_women, label='Women')
plt.title('Average Age at First Marriage in Czechia')
plt.xlabel('Year')
plt.ylabel('Average Age')
plt.grid(True)
plt.legend()
plt.show()
Net Reproduction Rate Analysis¶
In [17]:
# Filter for Net Reproduction Rate
net_reproduction_rate = df[df['Ukazatel'] == 'Čistá míra reprodukce']
# Convert 'Roky' to numeric
net_reproduction_rate['Roky'] = pd.to_numeric(net_reproduction_rate['Roky'])
# Sort by year
net_reproduction_rate = net_reproduction_rate.sort_values('Roky')
/tmp/ipykernel_380/3246783760.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy net_reproduction_rate['Roky'] = pd.to_numeric(net_reproduction_rate['Roky'])
In [18]:
# Plot Net Reproduction Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=net_reproduction_rate)
plt.title('Net Reproduction Rate in Czechia')
plt.xlabel('Year')
plt.ylabel('Net Reproduction Rate')
plt.grid(True)
plt.show()
Summary of the Findings¶
The analysis reveals some important demographic trends in Czechia. These include a declining fertility rate, increasing age at childbirth and first marriage, increasing life expectancy (albeit with dips due to major historical events), increasing divorce rates, and decreasing numbers of first marriages. These trends suggest an aging population and indicate shifting social norms in the country.