About

Ariel Data Challenge 2025

This project is part of the Ariel Data Challenge 2025 hosted on Kaggle. The competition focuses on analyzing simulated data from the Ariel space mission, which aims to study the atmospheres of exoplanets through spectroscopic observations.

Competition Overview

The Ariel Data Challenge 2025 presents participants with the task of extracting planetary atmospheric spectra from noisy telescope observations. Contestants work with simulated data that mimics what the Ariel space telescope will collect when studying exoplanet atmospheres through transit spectroscopy.

Key aspects of the challenge:

Data Processing: Handle realistic instrumental noise, calibration issues, and systematic effects
Spectral Extraction: Recover true planetary atmospheric spectra from raw detector images
Machine Learning: Develop robust algorithms that can generalize across different planetary systems and stellar types
Astrophysics: Apply domain knowledge about exoplanet atmospheres and transit spectroscopy

The Ariel Mission

Ariel (Atmospheric Remote-sensing Infrared Exoplanet Large-survey) is an ESA space mission scheduled to launch in 2029. It will observe the atmospheres of approximately 1000 exoplanets, providing unprecedented insights into planetary formation, evolution, and habitability conditions across our galaxy.

This competition helps advance the data analysis techniques that will be crucial for maximizing the scientific return from the actual Ariel mission observations.

About the Author

This solution repository is the work of George Perdrizet, who spent way too long in graduate school earning a PhD in Biochemistry and Molecular Biology before realizing that pipetting RNAs wasn’t quite scratching the intellectual itch. After a brief stint teaching chemistry and making the questionable life choice of pivoting to machine learning via bootcamp, he now spends his days as a Senior Data Science Mentor at 4Geeks Academy, teaching the next generation of data scientists to avoid his mistakes.

When not mentoring students, he’s busy being the Founder of Ask Agatha, a startup that somehow convinced Google to give them $25,000 in cloud credits to detect LLM-generated text (the irony is not lost on him). His journey from studying riboswitch folding mechanisms to teaching students why their random forest isn’t actually that random has been… educational.

When not wrestling with Python notebooks, explaining why correlation doesn’t imply causation for the thousandth time, or wondering if that spike in the spectrum is a water line or just cosmic noise, you can find more professional details at LinkedIn.

Fair warning: this repo contains an unhealthy amount of matplotlib tweaking, at least three different ways to standardize the same dataset, and the accumulated wisdom of someone who learned data science the hard way. Proceed with caution and plenty of coffee.