Overview
========
This tutorial provides a hands-on introduction to Docker containerization for AI/ML.
Topics
------
1. **Docker basics**: Images, containers, Dockerfiles, and key concepts
2. **Building containers**: From simple scripts to web applications
3. **Development containers**: Creating portable, shareable ML development environments
4. **Publishing images**: Sharing your work via Docker Hub
5. **Portability**: Verifying your environment works anywhere (GitHub Codespaces)
Why containerization for ML?
-----------------------------
In production machine learning systems, containerization simplifies building reliable, scalable, and reproducible pipelines.
.. mermaid::
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000000','primaryBorderColor':'#000000','lineColor':'#000000','secondaryColor':'#fff3e0','tertiaryColor':'#f1f8e9','fontSize':'26px'}}}%%
graph TB
subgraph "Production ML System"
A[Data Ingestion Container] --> B[Data Cleaning Container]
B --> C[Feature Store Container]
C --> D[Training Container]
D --> E[Model Registry Container]
E --> F[Model Serving API Container]
F --> G[API Gateway Container]
H[(Data Lake)] -.-> A
I[(Database)] -.-> C
E -.-> J[Monitoring & Logging Container]
F -.-> J
end
K[Client Applications] --> G
G --> F
**Key benefits:**
- **Reproducibility**: Exact dependencies across all environments
- **Modularity**: Independent scaling and updates of components
- **Portability**: Run anywhere without "works on my machine" problems
- **Isolation**: Multiple models with conflicting dependencies side-by-side
- **Collaboration**: Share entire development environments via images
What you'll build
-----------------
Through three hands-on labs (~1 hour total), you'll:
1. **Data Cleaner**: A containerized pandas-based data processing component demonstrating modular ML pipeline components
2. **Streamlit Dashboard**: An interactive web app showing port mapping and container networking
3. **ML Dev Container**: Your own customized, publishable development environment for course projects
Prerequisites
-------------
Before starting, ensure you have:
- **Docker** installed (`Get Docker `_)
- **Docker Hub account** (`Sign up `_)
- **VS Code** with Remote-Containers extension (optional but recommended)
- **GitHub account** (for Codespaces verification)
- Basic Python and command line knowledge
Ready to start?
---------------
Head to :doc:`concepts` to understand the fundamentals, or jump straight to :doc:`lab-01-data-cleaner` if you're already familiar with Docker basics.