Analysis of Meteorite Landings¶
This Jupyter Notebook provides an exploratory data analysis of meteorite landings using a dataset containing information on meteorite names, classifications, mass, fall type (Fell or Found), year of event, and location. The analysis aims to:
- Visualize key characteristics of meteorite landings, such as mass distribution and temporal trends.
- Identify insights into the types and frequency of meteorite events.
- Highlight anomalies or data quality issues within the dataset.
- Evaluate the feasibility of predictions based on the available data.
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%matplotlib inline
# Load the Dataset
file_path = 'meteorite_landings.csv'
meteorite_data = pd.read_csv(file_path)
Distribution of Meteorite Fall Types¶
# Visualize the distribution of 'fall' column
plt.figure(figsize=(6, 4))
sns.countplot(x='fall', data=meteorite_data)
plt.title('Distribution of Meteorite Fall Types (Fell vs Found)')
plt.xlabel('Fall Type')
plt.ylabel('Number of Meteorites')
plt.show()
Insight: The count plot above confirms that the dataset contains both 'Fell' and 'Found' meteorites, but 'Fell' types are much more numerous, as indicated in the analysis results.
Top Meteorite Classifications¶
# Visualize top recclasses
plt.figure(figsize=(10, 6))
top_classes = meteorite_data['recclass'].value_counts().nlargest(10)
sns.barplot(x=top_classes.index, y=top_classes.values, palette='viridis')
plt.title('Top 10 Meteorite Recclasses')
plt.xlabel('Recclass')
plt.ylabel('Number of Meteorites')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
/tmp/ipykernel_120/3215379220.py:4: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. sns.barplot(x=top_classes.index, y=top_classes.values, palette='viridis')
Insight: 'L6' and 'H5' are indeed among the most frequent meteorite classifications, visually validating the insight about common meteorite types.
Distribution of Meteorite Mass¶
# Visualize mass distribution (log scale to handle outliers)
plt.figure(figsize=(8, 5))
sns.histplot(meteorite_data['mass (g)'].dropna(), bins=50, log_scale=True)
plt.title('Distribution of Meteorite Mass (g) - Log Scale')
plt.xlabel('Mass (g) - Log Scale')
plt.ylabel('Frequency')
plt.show()
Insight: The histogram of meteorite mass, plotted on a logarithmic scale, demonstrates the wide range of masses and the presence of outliers with very large masses, confirming the insight about significant mass variation.
Temporal Distribution of Meteorite Landings¶
# Visualize meteorite landings over the years
plt.figure(figsize=(12, 6))
year_counts = meteorite_data['year'].value_counts().sort_index()
sns.lineplot(x=year_counts.index, y=year_counts.values)
plt.title('Temporal Distribution of Meteorite Landings Over Years')
plt.xlabel('Year')
plt.ylabel('Number of Meteorites')
plt.xlim(meteorite_data['year'].min(), meteorite_data['year'].max())
plt.show()
Insight: The line plot shows the temporal distribution of meteorite landings, illustrating the dataset's span from historical to recent events as described in the insights.
Missing Mass Values¶
# Identify and display entries with missing mass values
missing_mass_data = meteorite_data[meteorite_data['mass (g)'].isnull()]
print("Number of entries with missing mass values:", len(missing_mass_data))
print("\nSample of meteorites with missing mass values:")
print(missing_mass_data[['name', 'nametype', 'recclass', 'fall', 'year']].head())
Number of entries with missing mass values: 131 Sample of meteorites with missing mass values: name nametype recclass fall year 12 Aire-sur-la-Lys Valid Unknown Fell 1769.0 38 Angers Valid L6 Fell 1822.0 76 Barcelona (stone) Valid OC Fell 1704.0 93 Belville Valid OC Fell 1937.0 172 Castel Berardenga Valid Stone-uncl Fell 1791.0
Anomaly: The output above confirms the presence of numerous entries with missing mass values, listing a sample of these meteorites, thus validating the anomaly identified in the analysis results.
Uncertain Meteorite Classifications¶
# Identify entries with generic or uncertain recclass values
uncertain_classes_data = meteorite_data[meteorite_data['recclass'].isin(['Iron?', 'Stone-uncl', 'OC', 'Iron'])]
print("Number of entries with uncertain recclass values:", len(uncertain_classes_data))
print("\nSample of meteorites with uncertain recclasses:")
print(uncertain_classes_data[['name', 'nametype', 'recclass', 'fall', 'year']].head())
Number of entries with uncertain recclass values: 225 Sample of meteorites with uncertain recclasses: name nametype recclass fall year 33 Andhara Valid Stone-uncl Fell 1880.0 76 Barcelona (stone) Valid OC Fell 1704.0 93 Belville Valid OC Fell 1937.0 147 Bulls Run Valid Iron? Fell 1964.0 159 Cacak Valid OC Fell 1919.0
Anomaly: Displaying a sample of meteorites with 'Iron?', 'Stone-uncl', 'OC', and 'Iron' in the 'recclass' column confirms the anomaly of uncertain or generic classifications.
Meteorites with GeoLocation (0.0, 0.0)¶
# Identify entries with GeoLocation (0.0, 0.0)
zero_location_data = meteorite_data[meteorite_data['GeoLocation'] == '(0.0, 0.0)']
print("Number of entries with GeoLocation (0.0, 0.0):", len(zero_location_data))
print("\nSample of meteorites with zero GeoLocation:")
print(zero_location_data[['name', 'nametype', 'recclass', 'fall', 'year', 'GeoLocation']].head())
Number of entries with GeoLocation (0.0, 0.0): 6214 Sample of meteorites with zero GeoLocation: name nametype recclass fall year GeoLocation 37 Northwest Africa 5815 Valid L5 Found NaN (0.0, 0.0) 597 Mason Gully Valid H5 Fell 2010.0 (0.0, 0.0) 1655 Allan Hills 09004 Valid Howardite Found 2009.0 (0.0, 0.0) 1656 Allan Hills 09005 Valid L5 Found 2009.0 (0.0, 0.0) 1657 Allan Hills 09006 Valid H5 Found 2009.0 (0.0, 0.0)
Anomaly: The output lists meteorites with '(0.0, 0.0)' GeoLocation, indicating missing or anonymized location data, particularly for 'Found' meteorites as suggested by the analysis results.
Year '0920' Anomaly¶
# Identify entries with year '0920'
anomalous_year_data = meteorite_data[meteorite_data['year'] == 920]
print("Number of entries with year '0920':", len(anomalous_year_data))
print("\nMeteorites with year '0920':")
print(anomalous_year_data[['name', 'nametype', 'recclass', 'fall', 'year']].head())
Number of entries with year '0920': 1 Meteorites with year '0920': name nametype recclass fall year 679 Narni Valid Stone-uncl Fell 920.0
Anomaly: The output shows the record for 'Narni' with the year '920', which is likely an anomaly due to its significantly older date compared to most entries, as highlighted in the analysis results.
Predictive Analysis Feasibility¶
# Explain why predictive analysis is not statistically reliable with this dataset
print("Predictive analysis for meteorite landings based on this dataset alone is not statistically reliable. ")
print("Meteorite falls are random events influenced by various astronomical factors not captured in this dataset, ")
print("which primarily records historical landing events. No time-series or other features suitable for prediction are present in this limited dataset.")
Predictive analysis for meteorite landings based on this dataset alone is not statistically reliable. Meteorite falls are random events influenced by various astronomical factors not captured in this dataset, which primarily records historical landing events. No time-series or other features suitable for prediction are present in this limited dataset.
Prediction Status: As stated in the analysis results, predictive analysis is not feasible with this dataset due to the lack of relevant features for prediction and the random nature of meteorite falls. Therefore, no predictive model is implemented.
Summary of Analysis¶
# Display Summary of Analysis Results (from provided Analysis Results)
summary_text = \
"""Summary: Analysis of the meteorite landings CSV data reveals key insights into meteorite characteristics such as common classifications (L6, H5), mass distribution, temporal spread from historical to recent events, and geographical distribution. Anomalies include missing mass values, uncertain meteorite classifications, and entries with default coordinates (0.0, 0.0) potentially indicating data limitations. Predictive analysis is deemed not feasible due to the dataset's nature and the inherent randomness of meteorite fall events. Further analysis could involve geographical clustering and statistical distributions of mass and meteorite types if a larger dataset were available."""
print(summary_text)
Summary: Analysis of the meteorite landings CSV data reveals key insights into meteorite characteristics such as common classifications (L6, H5), mass distribution, temporal spread from historical to recent events, and geographical distribution. Anomalies include missing mass values, uncertain meteorite classifications, and entries with default coordinates (0.0, 0.0) potentially indicating data limitations. Predictive analysis is deemed not feasible due to the dataset's nature and the inherent randomness of meteorite fall events. Further analysis could involve geographical clustering and statistical distributions of mass and meteorite types if a larger dataset were available.
Summary: The notebook successfully visualized key insights from the meteorite landings data and identified anomalies as per the provided 'Analysis Results'. Predictive analysis was deemed not feasible due to the nature of the dataset.