Docker concepts
===============

Core terminology
----------------

Image
~~~~~

A blueprint or template containing your application code, dependencies, and runtime. Think of it as a snapshot of a filesystem with instructions on how to run your application.

**Example**: ``python:3.11-slim`` is an official Python image

Container
~~~~~~~~~

A running instance of an image. Containers are isolated processes that run your application. You can create many containers from a single image.

**What a container actually is:**

.. mermaid::

   %%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000000','primaryBorderColor':'#000000','lineColor':'#000000','secondaryColor':'#fff3e0','tertiaryColor':'#f1f8e9','fontSize':'32px'}}}%%
   graph TB
       subgraph "Host Machine"
           K[Host Kernel]
           DE[Docker Engine]
           
           subgraph "Container 1"
               P1[Process]
               FS1[Filesystem<br/>from Image]
               N1[Network<br/>Interface]
               R1[Resources<br/>CPU/Memory]
           end
           
           subgraph "Container 2"
               P2[Process]
               FS2[Filesystem<br/>from Image]
               N2[Network<br/>Interface]
               R2[Resources<br/>CPU/Memory]
           end
           
           K -.shares kernel.-> P1
           K -.shares kernel.-> P2
           DE -->|manages| P1
           DE -->|manages| P2
           
           P1 -.isolated via<br/>namespaces.-> P2
       end

A container is NOT a virtual machine. It's an isolated process on your host OS that:

- **Shares the host kernel** (lightweight - no OS overhead)
- **Has its own filesystem** (from the image)
- **Has isolated networking** (own IP address, ports)
- **Has resource limits** (controlled CPU, memory via cgroups)
- **Cannot see other containers' processes** (namespace isolation)

**Key difference from VMs**: Containers don't include a full operating system - they share the host's kernel but are isolated through Linux namespaces and cgroups.

Host
~~~~

The physical or virtual machine where Docker is installed and running. The host provides the kernel that containers share and the resources (CPU, memory, storage) that containers use.

Dockerfile
~~~~~~~~~~

A text file with instructions for building a Docker image. It specifies the base image, dependencies to install, files to copy, and commands to run.

**Example**:

.. code-block:: dockerfile

   FROM python:3.11-slim
   WORKDIR /app
   COPY requirements.txt .
   RUN pip install -r requirements.txt
   COPY . .
   CMD ["python", "app.py"]

Registry
~~~~~~~~

A repository for storing and sharing Docker images. Docker Hub is the default public registry, but alternatives include GitHub Container Registry, AWS ECR, and private registries.

**Popular registries**: Docker Hub, ghcr.io, AWS ECR, Google GCR

Tag
~~~

A label for different versions of an image (e.g., ``v1.0``, ``latest``). Tags help manage versions and deployments.

**Best practice**: Use `semantic versioning <https://semver.org/>`_ (``v1.0.0``) for production images

Volume
~~~~~~

A mechanism for persisting data generated by containers and sharing files between your host machine and containers.

**Use cases**: Development files, databases, logs

Port mapping
~~~~~~~~~~~~

Exposing container ports to your host machine, allowing you to access web apps or APIs running inside containers.

**Syntax**: ``-p HOST_PORT:CONTAINER_PORT``

Common commands
---------------

Building images
~~~~~~~~~~~~~~~

.. code-block:: bash

   # Build an image from Dockerfile in current directory
   docker build -t my-image:v1.0 .
   
   # Build with custom Dockerfile name
   docker build -t my-image -f Dockerfile.custom .

Running containers
~~~~~~~~~~~~~~~~~~

Docker containers can run in different modes depending on your use case. Understanding these modes helps you choose the right approach for your task.

**Attached mode (default)**

When you run a container without special flags, it runs in the foreground, showing output directly in your terminal:

.. code-block:: bash

   # Run and see output in real-time
   docker run my-image

Use attached mode for:

- Batch jobs and data processing tasks where you want to monitor progress
- Testing containers to see immediate output
- Short-running tasks that complete and exit

Press ``Ctrl+C`` to stop the container and return to your terminal.

**Detached mode**

The ``-d`` flag runs containers in the background, freeing your terminal:

.. code-block:: bash

   # Start container in background
   docker run -d --name my-container my-image
   
   # View logs from detached container
   docker logs my-container
   
   # Follow logs in real-time
   docker logs -f my-container
   
   # Attach to a running detached container
   docker attach my-container

Use detached mode for:

- Web servers and APIs (like Streamlit apps)
- Long-running services that don't need constant monitoring
- Background processing tasks

**Interactive mode**

The ``-it`` flags (interactive + terminal) let you interact with a container via shell:

.. code-block:: bash

   # Start container with interactive shell
   docker run -it my-image /bin/bash
   
   # Execute commands in already-running container
   docker exec -it my-container bash

Use interactive mode for:

- Development and debugging inside the container
- Exploring the container's filesystem
- Running commands manually for testing

Type ``exit`` to leave the shell and stop the container.

**Port mapping and volume mounts**

Add these options to any mode:

.. code-block:: bash

   # Run with port mapping
   docker run -p 8080:80 my-image

**Volume mount examples:**

.. tab-set::

   .. tab-item:: Linux/macOS

      .. code-block:: bash

         # Mount current directory
         docker run -v $(pwd):/app my-image

   .. tab-item:: Windows PowerShell

      .. code-block:: powershell

         # Mount current directory
         docker run -v ${PWD}:/app my-image

   .. tab-item:: Windows CMD

      .. code-block:: batch

         # Mount current directory
         docker run -v %cd%:/app my-image

**Note for Windows users**: When using absolute paths, use forward slashes and include the drive letter: ``-v C:/Users/yourname/project:/app``

Managing containers
~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   # List running containers
   docker ps
   
   # List all containers (including stopped)
   docker ps -a
   
   # Stop a container
   docker stop container-name
   
   # Remove a container
   docker rm container-name
   
   # View container logs
   docker logs container-name
   
   # Execute command in running container
   docker exec -it container-name bash

**Tip**: Customize the default ``docker ps`` output by adding a ``psFormat`` entry to ``~/.docker/config.json``:

.. code-block:: json

   {
     "psFormat": "table {{.Names}}\\t{{truncate .Image 25}}\\t{{.Status}}"
   }

Managing images
~~~~~~~~~~~~~~~

.. code-block:: bash

   # List images
   docker images
   
   # Remove an image
   docker rmi image-name
   
   # Tag an image
   docker tag my-image:latest username/my-image:v1.0
   
   # Push to registry
   docker push username/my-image:v1.0
   
   # Pull from registry
   docker pull username/my-image:v1.0

Cleanup
~~~~~~~

.. code-block:: bash

   # Remove all stopped containers
   docker container prune
   
   # Remove unused images
   docker image prune
   
   # Remove everything (use with caution!)
   docker system prune -a

Why Docker for ML?
------------------

Reproducibility
~~~~~~~~~~~~~~~

Every team member gets the exact same environment - same Python version, same package versions, same system dependencies.

**Problem solved**: "But it works on my machine!"

Portability
~~~~~~~~~~~

Develop locally on your laptop, train on a GPU server, deploy to the cloud - all using the same container image.

**Benefit**: Seamless transitions between environments

Isolation
~~~~~~~~~

Work on multiple projects with conflicting dependencies without interference.

**Example**: Project A uses TensorFlow 2.x while Project B uses TensorFlow 1.x

Modularity
~~~~~~~~~~

Break complex ML pipelines into independent containerized components that can be scaled, updated, and debugged separately.

**Pattern**: Data ingestion → Cleaning → Feature engineering → Training → Serving

Next steps
----------

Now that you understand the concepts, proceed to :doc:`dockerfile-guide` to learn how to write effective Dockerfiles, or jump straight to :doc:`lab-01-data-cleaner` to start building!