Docker concepts
Core terminology
Image
A blueprint or template containing your application code, dependencies, and runtime. Think of it as a snapshot of a filesystem with instructions on how to run your application.
Example: python:3.11-slim is an official Python image
Container
A running instance of an image. Containers are isolated processes that run your application. You can create many containers from a single image.
What a container actually is:
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#e3f2fd','primaryTextColor':'#000000','primaryBorderColor':'#000000','lineColor':'#000000','secondaryColor':'#fff3e0','tertiaryColor':'#f1f8e9','fontSize':'32px'}}}%%
graph TB
subgraph "Host Machine"
K[Host Kernel]
DE[Docker Engine]
subgraph "Container 1"
P1[Process]
FS1[Filesystem<br/>from Image]
N1[Network<br/>Interface]
R1[Resources<br/>CPU/Memory]
end
subgraph "Container 2"
P2[Process]
FS2[Filesystem<br/>from Image]
N2[Network<br/>Interface]
R2[Resources<br/>CPU/Memory]
end
K -.shares kernel.-> P1
K -.shares kernel.-> P2
DE -->|manages| P1
DE -->|manages| P2
P1 -.isolated via<br/>namespaces.-> P2
end
A container is NOT a virtual machine. It’s an isolated process on your host OS that:
Shares the host kernel (lightweight - no OS overhead)
Has its own filesystem (from the image)
Has isolated networking (own IP address, ports)
Has resource limits (controlled CPU, memory via cgroups)
Cannot see other containers’ processes (namespace isolation)
Key difference from VMs: Containers don’t include a full operating system - they share the host’s kernel but are isolated through Linux namespaces and cgroups.
Host
The physical or virtual machine where Docker is installed and running. The host provides the kernel that containers share and the resources (CPU, memory, storage) that containers use.
Dockerfile
A text file with instructions for building a Docker image. It specifies the base image, dependencies to install, files to copy, and commands to run.
Example:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]
Registry
A repository for storing and sharing Docker images. Docker Hub is the default public registry, but alternatives include GitHub Container Registry, AWS ECR, and private registries.
Popular registries: Docker Hub, ghcr.io, AWS ECR, Google GCR
Tag
A label for different versions of an image (e.g., v1.0, latest). Tags help manage versions and deployments.
Best practice: Use semantic versioning (v1.0.0) for production images
Volume
A mechanism for persisting data generated by containers and sharing files between your host machine and containers.
Use cases: Development files, databases, logs
Port mapping
Exposing container ports to your host machine, allowing you to access web apps or APIs running inside containers.
Syntax: -p HOST_PORT:CONTAINER_PORT
Common commands
Building images
# Build an image from Dockerfile in current directory
docker build -t my-image:v1.0 .
# Build with custom Dockerfile name
docker build -t my-image -f Dockerfile.custom .
Running containers
Docker containers can run in different modes depending on your use case. Understanding these modes helps you choose the right approach for your task.
Attached mode (default)
When you run a container without special flags, it runs in the foreground, showing output directly in your terminal:
# Run and see output in real-time
docker run my-image
Use attached mode for:
Batch jobs and data processing tasks where you want to monitor progress
Testing containers to see immediate output
Short-running tasks that complete and exit
Press Ctrl+C to stop the container and return to your terminal.
Detached mode
The -d flag runs containers in the background, freeing your terminal:
# Start container in background
docker run -d --name my-container my-image
# View logs from detached container
docker logs my-container
# Follow logs in real-time
docker logs -f my-container
# Attach to a running detached container
docker attach my-container
Use detached mode for:
Web servers and APIs (like Streamlit apps)
Long-running services that don’t need constant monitoring
Background processing tasks
Interactive mode
The -it flags (interactive + terminal) let you interact with a container via shell:
# Start container with interactive shell
docker run -it my-image /bin/bash
# Execute commands in already-running container
docker exec -it my-container bash
Use interactive mode for:
Development and debugging inside the container
Exploring the container’s filesystem
Running commands manually for testing
Type exit to leave the shell and stop the container.
Port mapping and volume mounts
Add these options to any mode:
# Run with port mapping
docker run -p 8080:80 my-image
Volume mount examples:
# Mount current directory
docker run -v $(pwd):/app my-image
# Mount current directory
docker run -v ${PWD}:/app my-image
# Mount current directory
docker run -v %cd%:/app my-image
Note for Windows users: When using absolute paths, use forward slashes and include the drive letter: -v C:/Users/yourname/project:/app
Managing containers
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# Stop a container
docker stop container-name
# Remove a container
docker rm container-name
# View container logs
docker logs container-name
# Execute command in running container
docker exec -it container-name bash
Tip: Customize the default docker ps output by adding a psFormat entry to ~/.docker/config.json:
{
"psFormat": "table {{.Names}}\\t{{truncate .Image 25}}\\t{{.Status}}"
}
Managing images
# List images
docker images
# Remove an image
docker rmi image-name
# Tag an image
docker tag my-image:latest username/my-image:v1.0
# Push to registry
docker push username/my-image:v1.0
# Pull from registry
docker pull username/my-image:v1.0
Cleanup
# Remove all stopped containers
docker container prune
# Remove unused images
docker image prune
# Remove everything (use with caution!)
docker system prune -a
Why Docker for ML?
Reproducibility
Every team member gets the exact same environment - same Python version, same package versions, same system dependencies.
Problem solved: “But it works on my machine!”
Portability
Develop locally on your laptop, train on a GPU server, deploy to the cloud - all using the same container image.
Benefit: Seamless transitions between environments
Isolation
Work on multiple projects with conflicting dependencies without interference.
Example: Project A uses TensorFlow 2.x while Project B uses TensorFlow 1.x
Modularity
Break complex ML pipelines into independent containerized components that can be scaled, updated, and debugged separately.
Pattern: Data ingestion → Cleaning → Feature engineering → Training → Serving
Next steps
Now that you understand the concepts, proceed to Dockerfile guide to learn how to write effective Dockerfiles, or jump straight to Lab 1: Data cleaner container to start building!