Czech Demographic Data Analysis¶

This notebook performs an analysis of Czech demographic data from the file oby01crgen.csv. The analysis aims to visualize key demographic trends, identify anomalies, and gain insights into the population dynamics of Czechia. The primary focus will be on fertility rates, life expectancy, marriage patterns, and divorce rates.

In [1]:

# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from scipy.stats import linregress

%matplotlib inline

In [2]:

# Load Data
df = pd.read_csv('oby01crgen.csv')
df.head()

Out[2]:

	Ukazatel	IndicatorType	Roky	CasR2A	Území	Uz0	Hodnota
0	Úhrnná plodnost	5405W	2023	2023	Česko	CZ	1.452572
1	Úhrnná plodnost	5405W	2022	2022	Česko	CZ	1.617747
2	Úhrnná plodnost	5405W	2021	2021	Česko	CZ	1.826536
3	Úhrnná plodnost	5405W	2020	2020	Česko	CZ	1.707373
4	Úhrnná plodnost	5405W	2019	2019	Česko	CZ	1.708963

Total Fertility Rate Analysis¶

In [3]:

# Filter for Total Fertility Rate
fertility_rate = df[df['Ukazatel'] == 'Úhrnná plodnost']

# Convert 'Roky' to numeric
fertility_rate['Roky'] = pd.to_numeric(fertility_rate['Roky'])

# Sort by year
fertility_rate = fertility_rate.sort_values('Roky')

fertility_rate.head()

/tmp/ipykernel_380/3454315819.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  fertility_rate['Roky'] = pd.to_numeric(fertility_rate['Roky'])

Out[3]:

	Ukazatel	IndicatorType	Roky	CasR2A	Území	Uz0	Hodnota
103	Úhrnná plodnost	5405W	1920	1920	Česko	CZ	2.964
102	Úhrnná plodnost	5405W	1921	1921	Česko	CZ	3.035
101	Úhrnná plodnost	5405W	1922	1922	Česko	CZ	2.882
100	Úhrnná plodnost	5405W	1923	1923	Česko	CZ	2.768
99	Úhrnná plodnost	5405W	1924	1924	Česko	CZ	2.590

In [4]:

# Plot Total Fertility Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=fertility_rate)
plt.title('Total Fertility Rate in Czechia (Úhrnná plodnost)')
plt.xlabel('Year')
plt.ylabel('Fertility Rate')
plt.grid(True)
plt.show()

No description has been provided for this image

Average Age of Mothers at Childbirth Analysis¶

In [5]:

# Filter for Average Age of Mothers
avg_age_mothers = df[df['Ukazatel'] == 'Průměrný věk matek při narození dítěte']

# Convert 'Roky' to numeric
avg_age_mothers['Roky'] = pd.to_numeric(avg_age_mothers['Roky'])

# Sort by year
avg_age_mothers = avg_age_mothers.sort_values('Roky')

/tmp/ipykernel_380/3690507151.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  avg_age_mothers['Roky'] = pd.to_numeric(avg_age_mothers['Roky'])

In [6]:

# Plot Average Age of Mothers
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=avg_age_mothers)
plt.title('Average Age of Mothers at Childbirth in Czechia')
plt.xlabel('Year')
plt.ylabel('Average Age')
plt.grid(True)
plt.show()

Life Expectancy Analysis¶

In [7]:

# Filter for Life Expectancy
life_expectancy_men = df[df['Ukazatel'] == 'Naděje dožití mužů']
life_expectancy_women = df[df['Ukazatel'] == 'Naděje dožití žen']

# Convert 'Roky' to numeric
life_expectancy_men['Roky'] = pd.to_numeric(life_expectancy_men['Roky'])
life_expectancy_women['Roky'] = pd.to_numeric(life_expectancy_women['Roky'])

# Sort by year
life_expectancy_men = life_expectancy_men.sort_values('Roky')
life_expectancy_women = life_expectancy_women.sort_values('Roky')

/tmp/ipykernel_380/1364265047.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  life_expectancy_men['Roky'] = pd.to_numeric(life_expectancy_men['Roky'])
/tmp/ipykernel_380/1364265047.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  life_expectancy_women['Roky'] = pd.to_numeric(life_expectancy_women['Roky'])

In [8]:

# Plot Life Expectancy
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=life_expectancy_men, label='Men')
sns.lineplot(x='Roky', y='Hodnota', data=life_expectancy_women, label='Women')
plt.title('Life Expectancy in Czechia')
plt.xlabel('Year')
plt.ylabel('Life Expectancy (Years)')
plt.grid(True)
plt.legend()
plt.show()

Divorce Rate Analysis¶

In [9]:

# Filter for Divorce Rate
divorce_rate = df[df['Ukazatel'] == 'Úhrnná rozvodovost (%)']

# Convert 'Roky' to numeric
divorce_rate['Roky'] = pd.to_numeric(divorce_rate['Roky'])

# Sort by year
divorce_rate = divorce_rate.sort_values('Roky')

/tmp/ipykernel_380/4274060893.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  divorce_rate['Roky'] = pd.to_numeric(divorce_rate['Roky'])

In [10]:

# Plot Divorce Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=divorce_rate)
plt.title('Crude Divorce Rate in Czechia')
plt.xlabel('Year')
plt.ylabel('Divorce Rate (%)')
plt.grid(True)
plt.show()

Average Duration of Marriage at Divorce¶

In [11]:

# Filter for marriage duration
marriage_duration = df[df['Ukazatel'] == 'Průměrná délka trvání manželství při rozvodu']

# Convert 'Roky' to numeric
marriage_duration['Roky'] = pd.to_numeric(marriage_duration['Roky'])

# Sort by year
marriage_duration = marriage_duration.sort_values('Roky')

/tmp/ipykernel_380/4006276042.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  marriage_duration['Roky'] = pd.to_numeric(marriage_duration['Roky'])

In [12]:

# Plot marriage duration
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=marriage_duration)
plt.title('Average duration of marriage at divorce')
plt.xlabel('Year')
plt.ylabel('Duration (years)')
plt.grid(True)
plt.show()

First Marriage Rate Analysis¶

In [13]:

# Filter for First Marriage Rate
first_marriage_men = df[df['Ukazatel'] == 'Tabulková prvosňatečnost mužů (%)']
first_marriage_women = df[df['Ukazatel'] == 'Tabulková prvosňatečnost žen (%)']

# Convert 'Roky' to numeric
first_marriage_men['Roky'] = pd.to_numeric(first_marriage_men['Roky'])
first_marriage_women['Roky'] = pd.to_numeric(first_marriage_women['Roky'])

# Sort by year
first_marriage_men = first_marriage_men.sort_values('Roky')
first_marriage_women = first_marriage_women.sort_values('Roky')

/tmp/ipykernel_380/3112156190.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  first_marriage_men['Roky'] = pd.to_numeric(first_marriage_men['Roky'])
/tmp/ipykernel_380/3112156190.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  first_marriage_women['Roky'] = pd.to_numeric(first_marriage_women['Roky'])

In [14]:

# Plot First Marriage Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=first_marriage_men, label='Men')
sns.lineplot(x='Roky', y='Hodnota', data=first_marriage_women, label='Women')
plt.title('First Marriage Rate in Czechia')
plt.xlabel('Year')
plt.ylabel('First Marriage Rate (%)')
plt.grid(True)
plt.legend()
plt.show()

Average Age at First Marriage Analysis¶

In [15]:

# Filter for Average Age at First Marriage
age_first_marriage_men = df[df['Ukazatel'] == 'Průměrný věk mužů při prvním sňatku']
age_first_marriage_women = df[df['Ukazatel'] == 'Průměrný věk žen při prvním sňatku']

# Convert 'Roky' to numeric
age_first_marriage_men['Roky'] = pd.to_numeric(age_first_marriage_men['Roky'])
age_first_marriage_women['Roky'] = pd.to_numeric(age_first_marriage_women['Roky'])

# Sort by year
age_first_marriage_men = age_first_marriage_men.sort_values('Roky')
age_first_marriage_women = age_first_marriage_women.sort_values('Roky')

/tmp/ipykernel_380/2140980104.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  age_first_marriage_men['Roky'] = pd.to_numeric(age_first_marriage_men['Roky'])
/tmp/ipykernel_380/2140980104.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  age_first_marriage_women['Roky'] = pd.to_numeric(age_first_marriage_women['Roky'])

In [16]:

# Plot Average Age at First Marriage
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=age_first_marriage_men, label='Men')
sns.lineplot(x='Roky', y='Hodnota', data=age_first_marriage_women, label='Women')
plt.title('Average Age at First Marriage in Czechia')
plt.xlabel('Year')
plt.ylabel('Average Age')
plt.grid(True)
plt.legend()
plt.show()

Net Reproduction Rate Analysis¶

In [17]:

# Filter for Net Reproduction Rate
net_reproduction_rate = df[df['Ukazatel'] == 'Čistá míra reprodukce']

# Convert 'Roky' to numeric
net_reproduction_rate['Roky'] = pd.to_numeric(net_reproduction_rate['Roky'])

# Sort by year
net_reproduction_rate = net_reproduction_rate.sort_values('Roky')

/tmp/ipykernel_380/3246783760.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  net_reproduction_rate['Roky'] = pd.to_numeric(net_reproduction_rate['Roky'])

In [18]:

# Plot Net Reproduction Rate
plt.figure(figsize=(12, 6))
sns.lineplot(x='Roky', y='Hodnota', data=net_reproduction_rate)
plt.title('Net Reproduction Rate in Czechia')
plt.xlabel('Year')
plt.ylabel('Net Reproduction Rate')
plt.grid(True)
plt.show()

Summary of the Findings¶

The analysis reveals some important demographic trends in Czechia. These include a declining fertility rate, increasing age at childbirth and first marriage, increasing life expectancy (albeit with dips due to major historical events), increasing divorce rates, and decreasing numbers of first marriages. These trends suggest an aging population and indicate shifting social norms in the country.