Docker concepts =============== Core terminology ---------------- Image ~~~~~ A blueprint or template containing your application code, dependencies, and runtime. Think of it as a snapshot of a filesystem with instructions on how to run your application. **Example**: ``python:3.11-slim`` is an official Python image Container ~~~~~~~~~ A running instance of an image. Containers are isolated processes that run your application. You can create many containers from a single image. **What a container actually is:** .. mermaid:: %%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000000','primaryBorderColor':'#000000','lineColor':'#000000','secondaryColor':'#fff3e0','tertiaryColor':'#f1f8e9','fontSize':'32px'}}}%% graph TB subgraph "Host Machine" K[Host Kernel] DE[Docker Engine] subgraph "Container 1" P1[Process] FS1[Filesystem
from Image] N1[Network
Interface] R1[Resources
CPU/Memory] end subgraph "Container 2" P2[Process] FS2[Filesystem
from Image] N2[Network
Interface] R2[Resources
CPU/Memory] end K -.shares kernel.-> P1 K -.shares kernel.-> P2 DE -->|manages| P1 DE -->|manages| P2 P1 -.isolated via
namespaces.-> P2 end A container is NOT a virtual machine. It's an isolated process on your host OS that: - **Shares the host kernel** (lightweight - no OS overhead) - **Has its own filesystem** (from the image) - **Has isolated networking** (own IP address, ports) - **Has resource limits** (controlled CPU, memory via cgroups) - **Cannot see other containers' processes** (namespace isolation) **Key difference from VMs**: Containers don't include a full operating system - they share the host's kernel but are isolated through Linux namespaces and cgroups. Host ~~~~ The physical or virtual machine where Docker is installed and running. The host provides the kernel that containers share and the resources (CPU, memory, storage) that containers use. Dockerfile ~~~~~~~~~~ A text file with instructions for building a Docker image. It specifies the base image, dependencies to install, files to copy, and commands to run. **Example**: .. code-block:: dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["python", "app.py"] Registry ~~~~~~~~ A repository for storing and sharing Docker images. Docker Hub is the default public registry, but alternatives include GitHub Container Registry, AWS ECR, and private registries. **Popular registries**: Docker Hub, ghcr.io, AWS ECR, Google GCR Tag ~~~ A label for different versions of an image (e.g., ``v1.0``, ``latest``). Tags help manage versions and deployments. **Best practice**: Use `semantic versioning `_ (``v1.0.0``) for production images Volume ~~~~~~ A mechanism for persisting data generated by containers and sharing files between your host machine and containers. **Use cases**: Development files, databases, logs Port mapping ~~~~~~~~~~~~ Exposing container ports to your host machine, allowing you to access web apps or APIs running inside containers. **Syntax**: ``-p HOST_PORT:CONTAINER_PORT`` Common commands --------------- Building images ~~~~~~~~~~~~~~~ .. code-block:: bash # Build an image from Dockerfile in current directory docker build -t my-image:v1.0 . # Build with custom Dockerfile name docker build -t my-image -f Dockerfile.custom . Running containers ~~~~~~~~~~~~~~~~~~ Docker containers can run in different modes depending on your use case. Understanding these modes helps you choose the right approach for your task. **Attached mode (default)** When you run a container without special flags, it runs in the foreground, showing output directly in your terminal: .. code-block:: bash # Run and see output in real-time docker run my-image Use attached mode for: - Batch jobs and data processing tasks where you want to monitor progress - Testing containers to see immediate output - Short-running tasks that complete and exit Press ``Ctrl+C`` to stop the container and return to your terminal. **Detached mode** The ``-d`` flag runs containers in the background, freeing your terminal: .. code-block:: bash # Start container in background docker run -d --name my-container my-image # View logs from detached container docker logs my-container # Follow logs in real-time docker logs -f my-container # Attach to a running detached container docker attach my-container Use detached mode for: - Web servers and APIs (like Streamlit apps) - Long-running services that don't need constant monitoring - Background processing tasks **Interactive mode** The ``-it`` flags (interactive + terminal) let you interact with a container via shell: .. code-block:: bash # Start container with interactive shell docker run -it my-image /bin/bash # Execute commands in already-running container docker exec -it my-container bash Use interactive mode for: - Development and debugging inside the container - Exploring the container's filesystem - Running commands manually for testing Type ``exit`` to leave the shell and stop the container. **Port mapping and volume mounts** Add these options to any mode: .. code-block:: bash # Run with port mapping docker run -p 8080:80 my-image **Volume mount examples:** .. tab-set:: .. tab-item:: Linux/macOS .. code-block:: bash # Mount current directory docker run -v $(pwd):/app my-image .. tab-item:: Windows PowerShell .. code-block:: powershell # Mount current directory docker run -v ${PWD}:/app my-image .. tab-item:: Windows CMD .. code-block:: batch # Mount current directory docker run -v %cd%:/app my-image **Note for Windows users**: When using absolute paths, use forward slashes and include the drive letter: ``-v C:/Users/yourname/project:/app`` Managing containers ~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # List running containers docker ps # List all containers (including stopped) docker ps -a # Stop a container docker stop container-name # Remove a container docker rm container-name # View container logs docker logs container-name # Execute command in running container docker exec -it container-name bash **Tip**: Customize the default ``docker ps`` output by adding a ``psFormat`` entry to ``~/.docker/config.json``: .. code-block:: json { "psFormat": "table {{.Names}}\\t{{truncate .Image 25}}\\t{{.Status}}" } Managing images ~~~~~~~~~~~~~~~ .. code-block:: bash # List images docker images # Remove an image docker rmi image-name # Tag an image docker tag my-image:latest username/my-image:v1.0 # Push to registry docker push username/my-image:v1.0 # Pull from registry docker pull username/my-image:v1.0 Cleanup ~~~~~~~ .. code-block:: bash # Remove all stopped containers docker container prune # Remove unused images docker image prune # Remove everything (use with caution!) docker system prune -a Why Docker for ML? ------------------ Reproducibility ~~~~~~~~~~~~~~~ Every team member gets the exact same environment - same Python version, same package versions, same system dependencies. **Problem solved**: "But it works on my machine!" Portability ~~~~~~~~~~~ Develop locally on your laptop, train on a GPU server, deploy to the cloud - all using the same container image. **Benefit**: Seamless transitions between environments Isolation ~~~~~~~~~ Work on multiple projects with conflicting dependencies without interference. **Example**: Project A uses TensorFlow 2.x while Project B uses TensorFlow 1.x Modularity ~~~~~~~~~~ Break complex ML pipelines into independent containerized components that can be scaled, updated, and debugged separately. **Pattern**: Data ingestion → Cleaning → Feature engineering → Training → Serving Next steps ---------- Now that you understand the concepts, proceed to :doc:`dockerfile-guide` to learn how to write effective Dockerfiles, or jump straight to :doc:`lab-01-data-cleaner` to start building!