-
Frame Sampling for Data Augmentation
Data augmentation is a crucial technique in machine learning for increasing the effective size of training datasets and improving model robustness. In the context of the Ariel Data Challenge 2025, where we’re working with time-series spectroscopic data, frame sampling presents an interesting approach to create multiple training examples from each planet’s observation sequence.
-
Wavelength Smoothing: Taming Spectral Noise for Clean Time Series
With clean spectral signals extracted from the AIRS-CH0 and FGS1 detectors, the next challenge emerges: individual wavelength channels are incredibly noisy. Each extracted time series shows significant frame-to-frame variations that could mask the subtle exoplanet atmospheric signals we’re trying to detect. Time for some smoothing.
-
Signal Extraction Part II: FGS1 Data Reduction
Building on the success of AIRS-CH0 signal extraction, let’s apply the same intelligent data reduction approach to the FGS1 guidance camera data. The goal is to identify and extract just the signal-bearing pixels from the 2D frames, reducing data volume while preserving the exoplanet transit signatures.
-
Signal Extraction: From 3D Spectrograms to 1D Time Series
With the signal correction pipeline delivering clean, calibrated data, it’s time to tackle the next challenge: extracting meaningful spectral signals from the AIRS-CH0 frames. The goal is to transform bulky 3D arrays into a focused 1D time series that capture the wavelength signals over time for each star.
-
Performance Optimization: Making the Pipeline Kaggle-Ready
The signal correction pipeline works beautifully, but there’s one small problem: it takes forever to run. With 1100 planets to process and a 9-hour runtime limit on Kaggle submission notebooks, we needed some serious performance optimization. Time to make this thing fast.
-
From Notebook to Package: Refactoring and Deploying the Signal Correction Pipeline
Yesterday’s challenge was getting the signal correction pipeline to work. Today’s challenge? Making it production-ready. Time to refactor the preprocessing code into a proper Python package, add comprehensive testing, and set up automated CI/CD for deployment to PyPI.
-
Signal Correction Pipeline: From Raw Counts to Science-Ready Data
Time to tackle the full signal correction pipeline! After understanding the timing structure and CDS basics, it’s time to implement all six preprocessing steps to turn noisy detector outputs into clean, calibrated data suitable for exoplanet analysis.
-
Understanding Timing and CDS: Making Sense of the Axis Info
Time to dig into the timing metadata and figure out how these instruments actually work together. The axis info data turned out to be much more useful than I initially thought - it’s not about satellite alignment at all, but about the structure of the signal matrices themselves.
-
AIRS-CH0 & FGS1 signal EDA
Next up - let’s take a look at the signal data from both instruments: FGS1 (the guidance/alignment camera) and AIRS-CH0 (the IR spectrometer).
-
Project Introduction & Initial EDA
Welcome to my exploration of the Ariel Data Challenge 2025! This Kaggle competition presents a fascinating problem: extracting planetary atmospheric spectra from simulated space telescope observations.