Dockerfile guide ================ A Dockerfile contains instructions executed in order to build an image. Understanding these instructions is key to creating effective containerized applications. Essential instructions ---------------------- FROM ~~~~ Specifies the base image to build upon. .. code-block:: dockerfile FROM python:3.11-slim **Best practice**: Use specific tags (``3.11-slim``) instead of ``latest`` for reproducibility. WORKDIR ~~~~~~~ Sets the working directory inside the container. All subsequent commands run from this directory. .. code-block:: dockerfile WORKDIR /app **Tip**: Creates the directory if it doesn't exist. COPY ~~~~ Copies files from your host machine to the container. .. code-block:: dockerfile COPY requirements.txt /app/ COPY . /app **Pattern**: Copy dependencies first, then code (for better caching). RUN ~~~ Executes commands during the image build process. Commonly used to install dependencies. .. code-block:: dockerfile RUN pip install --no-cache-dir -r requirements.txt **Best practice**: Combine related commands with ``&&`` to reduce layers. CMD ~~~ Specifies the default command to run when a container starts. .. code-block:: dockerfile CMD ["python", "app.py"] **Note**: Only one ``CMD`` per Dockerfile. Easily overridden at runtime. **Important**: Containers run as long as the command is running. When the command exits, the container stops. If you don't specify a ``CMD`` (or run with ``-it`` for interactive mode), the container will start and immediately exit. For long-running services like web apps, the ``CMD`` should start a server that keeps running. For batch jobs, the container exits when processing completes. EXPOSE ~~~~~~ Documents which ports the container listens on. .. code-block:: dockerfile EXPOSE 8080 **Important**: This doesn't actually publish the port - use ``-p`` when running. ENV ~~~ Sets environment variables. .. code-block:: dockerfile ENV PYTHONUNBUFFERED=1 ENV MODEL_PATH=/models USER ~~~~ Sets the user for running subsequent commands. .. code-block:: dockerfile RUN useradd -m appuser USER appuser **Security**: Never run as root in production! Complete example ---------------- Here's a well-structured Dockerfile for an ML application: .. code-block:: dockerfile # Use specific Python version FROM python:3.11-slim # Set working directory WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \ git \ curl \ && rm -rf /var/lib/apt/lists/* # Copy and install Python dependencies first (for caching) COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY . . # Create non-root user RUN useradd -m -s /bin/bash mluser && \ chown -R mluser:mluser /app USER mluser # Set environment variables ENV PYTHONUNBUFFERED=1 # Expose port EXPOSE 8000 # Run application CMD ["python", "app.py"] Best practices -------------- Layer caching ~~~~~~~~~~~~~ Docker caches each layer. Order instructions from least to most frequently changing: .. code-block:: dockerfile # Good - dependencies change less often than code FROM python:3.11-slim COPY requirements.txt . RUN pip install -r requirements.txt COPY . . # Bad - code changes invalidate dependency cache FROM python:3.11-slim COPY . . RUN pip install -r requirements.txt Minimize layers ~~~~~~~~~~~~~~~ Combine related ``RUN`` commands: .. code-block:: dockerfile # Good - one layer RUN apt-get update && \ apt-get install -y git curl && \ rm -rf /var/lib/apt/lists/* # Bad - three layers RUN apt-get update RUN apt-get install -y git curl RUN rm -rf /var/lib/apt/lists/* Use .dockerignore ~~~~~~~~~~~~~~~~~ Exclude unnecessary files from the build context: .. code-block:: text # .dockerignore __pycache__/ *.pyc .git/ .venv/ *.md .DS_Store Keep images small ~~~~~~~~~~~~~~~~~ .. code-block:: dockerfile # Use slim/alpine variants FROM python:3.11-slim # ~120 MB # vs FROM python:3.11 # ~900 MB # Clean up in same layer RUN apt-get update && \ apt-get install -y curl && \ rm -rf /var/lib/apt/lists/* # Use --no-cache-dir with pip RUN pip install --no-cache-dir pandas Specific tags ~~~~~~~~~~~~~ .. code-block:: dockerfile # Good - reproducible FROM python:3.11.5-slim # Bad - breaks when "latest" updates FROM python:latest Security ~~~~~~~~ .. code-block:: dockerfile # Create and use non-root user RUN groupadd -r appgroup && \ useradd -r -g appgroup appuser USER appuser # Don't expose unnecessary ports # Only EXPOSE what's needed Common patterns --------------- ML training container ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: dockerfile FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime WORKDIR /workspace COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY train.py . COPY data/ data/ CMD ["python", "train.py"] Model serving container ~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY model.pkl . COPY serve.py . EXPOSE 8000 CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8000"] Jupyter notebook container ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: dockerfile FROM jupyter/scipy-notebook:latest COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt EXPOSE 8888 CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root"] Debugging tips -------------- Build with verbose output ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash docker build --progress=plain --no-cache -t my-image . Inspect intermediate layers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # Get layer IDs docker history my-image # Run shell in specific layer docker run -it /bin/bash Check build context size ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash # See what's being sent to Docker daemon docker build --no-cache -t test . 2>&1 | grep "Sending build context" Next steps ---------- Now that you understand Dockerfiles, try the hands-on labs: 1. :doc:`lab-01-data-cleaner` - Basic Dockerfile 2. :doc:`lab-02-streamlit-app` - Web app with port mapping 3. :doc:`lab-03-ml-dev-container` - Dev container configuration