Docker concepts
===============
Core terminology
----------------
Image
~~~~~
A blueprint or template containing your application code, dependencies, and runtime. Think of it as a snapshot of a filesystem with instructions on how to run your application.
**Example**: ``python:3.11-slim`` is an official Python image
Container
~~~~~~~~~
A running instance of an image. Containers are isolated processes that run your application. You can create many containers from a single image.
**What a container actually is:**
.. mermaid::
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000000','primaryBorderColor':'#000000','lineColor':'#000000','secondaryColor':'#fff3e0','tertiaryColor':'#f1f8e9','fontSize':'32px'}}}%%
graph TB
subgraph "Host Machine"
K[Host Kernel]
DE[Docker Engine]
subgraph "Container 1"
P1[Process]
FS1[Filesystem
from Image]
N1[Network
Interface]
R1[Resources
CPU/Memory]
end
subgraph "Container 2"
P2[Process]
FS2[Filesystem
from Image]
N2[Network
Interface]
R2[Resources
CPU/Memory]
end
K -.shares kernel.-> P1
K -.shares kernel.-> P2
DE -->|manages| P1
DE -->|manages| P2
P1 -.isolated via
namespaces.-> P2
end
A container is NOT a virtual machine. It's an isolated process on your host OS that:
- **Shares the host kernel** (lightweight - no OS overhead)
- **Has its own filesystem** (from the image)
- **Has isolated networking** (own IP address, ports)
- **Has resource limits** (controlled CPU, memory via cgroups)
- **Cannot see other containers' processes** (namespace isolation)
**Key difference from VMs**: Containers don't include a full operating system - they share the host's kernel but are isolated through Linux namespaces and cgroups.
Host
~~~~
The physical or virtual machine where Docker is installed and running. The host provides the kernel that containers share and the resources (CPU, memory, storage) that containers use.
Dockerfile
~~~~~~~~~~
A text file with instructions for building a Docker image. It specifies the base image, dependencies to install, files to copy, and commands to run.
**Example**:
.. code-block:: dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]
Registry
~~~~~~~~
A repository for storing and sharing Docker images. Docker Hub is the default public registry, but alternatives include GitHub Container Registry, AWS ECR, and private registries.
**Popular registries**: Docker Hub, ghcr.io, AWS ECR, Google GCR
Tag
~~~
A label for different versions of an image (e.g., ``v1.0``, ``latest``). Tags help manage versions and deployments.
**Best practice**: Use `semantic versioning `_ (``v1.0.0``) for production images
Volume
~~~~~~
A mechanism for persisting data generated by containers and sharing files between your host machine and containers.
**Use cases**: Development files, databases, logs
Port mapping
~~~~~~~~~~~~
Exposing container ports to your host machine, allowing you to access web apps or APIs running inside containers.
**Syntax**: ``-p HOST_PORT:CONTAINER_PORT``
Common commands
---------------
Building images
~~~~~~~~~~~~~~~
.. code-block:: bash
# Build an image from Dockerfile in current directory
docker build -t my-image:v1.0 .
# Build with custom Dockerfile name
docker build -t my-image -f Dockerfile.custom .
Running containers
~~~~~~~~~~~~~~~~~~
Docker containers can run in different modes depending on your use case. Understanding these modes helps you choose the right approach for your task.
**Attached mode (default)**
When you run a container without special flags, it runs in the foreground, showing output directly in your terminal:
.. code-block:: bash
# Run and see output in real-time
docker run my-image
Use attached mode for:
- Batch jobs and data processing tasks where you want to monitor progress
- Testing containers to see immediate output
- Short-running tasks that complete and exit
Press ``Ctrl+C`` to stop the container and return to your terminal.
**Detached mode**
The ``-d`` flag runs containers in the background, freeing your terminal:
.. code-block:: bash
# Start container in background
docker run -d --name my-container my-image
# View logs from detached container
docker logs my-container
# Follow logs in real-time
docker logs -f my-container
# Attach to a running detached container
docker attach my-container
Use detached mode for:
- Web servers and APIs (like Streamlit apps)
- Long-running services that don't need constant monitoring
- Background processing tasks
**Interactive mode**
The ``-it`` flags (interactive + terminal) let you interact with a container via shell:
.. code-block:: bash
# Start container with interactive shell
docker run -it my-image /bin/bash
# Execute commands in already-running container
docker exec -it my-container bash
Use interactive mode for:
- Development and debugging inside the container
- Exploring the container's filesystem
- Running commands manually for testing
Type ``exit`` to leave the shell and stop the container.
**Port mapping and volume mounts**
Add these options to any mode:
.. code-block:: bash
# Run with port mapping
docker run -p 8080:80 my-image
**Volume mount examples:**
.. tab-set::
.. tab-item:: Linux/macOS
.. code-block:: bash
# Mount current directory
docker run -v $(pwd):/app my-image
.. tab-item:: Windows PowerShell
.. code-block:: powershell
# Mount current directory
docker run -v ${PWD}:/app my-image
.. tab-item:: Windows CMD
.. code-block:: batch
# Mount current directory
docker run -v %cd%:/app my-image
**Note for Windows users**: When using absolute paths, use forward slashes and include the drive letter: ``-v C:/Users/yourname/project:/app``
Managing containers
~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# Stop a container
docker stop container-name
# Remove a container
docker rm container-name
# View container logs
docker logs container-name
# Execute command in running container
docker exec -it container-name bash
**Tip**: Customize the default ``docker ps`` output by adding a ``psFormat`` entry to ``~/.docker/config.json``:
.. code-block:: json
{
"psFormat": "table {{.Names}}\\t{{truncate .Image 25}}\\t{{.Status}}"
}
Managing images
~~~~~~~~~~~~~~~
.. code-block:: bash
# List images
docker images
# Remove an image
docker rmi image-name
# Tag an image
docker tag my-image:latest username/my-image:v1.0
# Push to registry
docker push username/my-image:v1.0
# Pull from registry
docker pull username/my-image:v1.0
Cleanup
~~~~~~~
.. code-block:: bash
# Remove all stopped containers
docker container prune
# Remove unused images
docker image prune
# Remove everything (use with caution!)
docker system prune -a
Why Docker for ML?
------------------
Reproducibility
~~~~~~~~~~~~~~~
Every team member gets the exact same environment - same Python version, same package versions, same system dependencies.
**Problem solved**: "But it works on my machine!"
Portability
~~~~~~~~~~~
Develop locally on your laptop, train on a GPU server, deploy to the cloud - all using the same container image.
**Benefit**: Seamless transitions between environments
Isolation
~~~~~~~~~
Work on multiple projects with conflicting dependencies without interference.
**Example**: Project A uses TensorFlow 2.x while Project B uses TensorFlow 1.x
Modularity
~~~~~~~~~~
Break complex ML pipelines into independent containerized components that can be scaled, updated, and debugged separately.
**Pattern**: Data ingestion → Cleaning → Feature engineering → Training → Serving
Next steps
----------
Now that you understand the concepts, proceed to :doc:`dockerfile-guide` to learn how to write effective Dockerfiles, or jump straight to :doc:`lab-01-data-cleaner` to start building!