Docker concepts

Core terminology

Image

A blueprint or template containing your application code, dependencies, and runtime. Think of it as a snapshot of a filesystem with instructions on how to run your application.

Example: python:3.11-slim is an official Python image

Container

A running instance of an image. Containers are isolated processes that run your application. You can create many containers from a single image.

What a container actually is:

        %%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000000','primaryBorderColor':'#000000','lineColor':'#000000','secondaryColor':'#fff3e0','tertiaryColor':'#f1f8e9','fontSize':'32px'}}}%%
graph TB
    subgraph "Host Machine"
        K[Host Kernel]
        DE[Docker Engine]

        subgraph "Container 1"
            P1[Process]
            FS1[Filesystem<br/>from Image]
            N1[Network<br/>Interface]
            R1[Resources<br/>CPU/Memory]
        end

        subgraph "Container 2"
            P2[Process]
            FS2[Filesystem<br/>from Image]
            N2[Network<br/>Interface]
            R2[Resources<br/>CPU/Memory]
        end

        K -.shares kernel.-> P1
        K -.shares kernel.-> P2
        DE -->|manages| P1
        DE -->|manages| P2

        P1 -.isolated via<br/>namespaces.-> P2
    end

A container is NOT a virtual machine. It’s an isolated process on your host OS that:

Shares the host kernel (lightweight - no OS overhead)
Has its own filesystem (from the image)
Has isolated networking (own IP address, ports)
Has resource limits (controlled CPU, memory via cgroups)
Cannot see other containers’ processes (namespace isolation)

Key difference from VMs: Containers don’t include a full operating system - they share the host’s kernel but are isolated through Linux namespaces and cgroups.

Host

The physical or virtual machine where Docker is installed and running. The host provides the kernel that containers share and the resources (CPU, memory, storage) that containers use.

Dockerfile

A text file with instructions for building a Docker image. It specifies the base image, dependencies to install, files to copy, and commands to run.

Example:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]

Registry

A repository for storing and sharing Docker images. Docker Hub is the default public registry, but alternatives include GitHub Container Registry, AWS ECR, and private registries.

Popular registries: Docker Hub, ghcr.io, AWS ECR, Google GCR

Tag

A label for different versions of an image (e.g., v1.0, latest). Tags help manage versions and deployments.

Best practice: Use semantic versioning (v1.0.0) for production images

Volume

A mechanism for persisting data generated by containers and sharing files between your host machine and containers.

Use cases: Development files, databases, logs

Port mapping

Exposing container ports to your host machine, allowing you to access web apps or APIs running inside containers.

Syntax: -p HOST_PORT:CONTAINER_PORT

Common commands

Building images

# Build an image from Dockerfile in current directory
docker build -t my-image:v1.0 .

# Build with custom Dockerfile name
docker build -t my-image -f Dockerfile.custom .

Running containers

Docker containers can run in different modes depending on your use case. Understanding these modes helps you choose the right approach for your task.

Attached mode (default)

When you run a container without special flags, it runs in the foreground, showing output directly in your terminal:

# Run and see output in real-time
docker run my-image

Use attached mode for:

Batch jobs and data processing tasks where you want to monitor progress
Testing containers to see immediate output
Short-running tasks that complete and exit

Press Ctrl+C to stop the container and return to your terminal.

Detached mode

The -d flag runs containers in the background, freeing your terminal:

# Start container in background
docker run -d --name my-container my-image

# View logs from detached container
docker logs my-container

# Follow logs in real-time
docker logs -f my-container

# Attach to a running detached container
docker attach my-container

Use detached mode for:

Web servers and APIs (like Streamlit apps)
Long-running services that don’t need constant monitoring
Background processing tasks

Interactive mode

The -it flags (interactive + terminal) let you interact with a container via shell:

# Start container with interactive shell
docker run -it my-image /bin/bash

# Execute commands in already-running container
docker exec -it my-container bash

Use interactive mode for:

Development and debugging inside the container
Exploring the container’s filesystem
Running commands manually for testing

Type exit to leave the shell and stop the container.

Port mapping and volume mounts

Add these options to any mode:

# Run with port mapping
docker run -p 8080:80 my-image

Volume mount examples:

Linux/macOS

# Mount current directory
docker run -v $(pwd):/app my-image

Windows PowerShell

# Mount current directory
docker run -v ${PWD}:/app my-image

Windows CMD

# Mount current directory
docker run -v %cd%:/app my-image

Note for Windows users: When using absolute paths, use forward slashes and include the drive letter: -v C:/Users/yourname/project:/app

Managing containers

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a container
docker stop container-name

# Remove a container
docker rm container-name

# View container logs
docker logs container-name

# Execute command in running container
docker exec -it container-name bash

Tip: Customize the default docker ps output by adding a psFormat entry to ~/.docker/config.json:

{
  "psFormat": "table {{.Names}}\\t{{truncate .Image 25}}\\t{{.Status}}"
}

Managing images

# List images
docker images

# Remove an image
docker rmi image-name

# Tag an image
docker tag my-image:latest username/my-image:v1.0

# Push to registry
docker push username/my-image:v1.0

# Pull from registry
docker pull username/my-image:v1.0

Cleanup

# Remove all stopped containers
docker container prune

# Remove unused images
docker image prune

# Remove everything (use with caution!)
docker system prune -a

Why Docker for ML?

Reproducibility

Every team member gets the exact same environment - same Python version, same package versions, same system dependencies.

Problem solved: “But it works on my machine!”

Portability

Develop locally on your laptop, train on a GPU server, deploy to the cloud - all using the same container image.

Benefit: Seamless transitions between environments

Isolation

Work on multiple projects with conflicting dependencies without interference.

Example: Project A uses TensorFlow 2.x while Project B uses TensorFlow 1.x

Modularity

Break complex ML pipelines into independent containerized components that can be scaled, updated, and debugged separately.

Pattern: Data ingestion → Cleaning → Feature engineering → Training → Serving

Next steps

Now that you understand the concepts, proceed to Dockerfile guide to learn how to write effective Dockerfiles, or jump straight to Lab 1: Data cleaner container to start building!