Docker Like a Pro: From Multi-Stage Images to Runtime Optimizations
In this blog, we’ll explore how to use multi-stage Docker builds to create efficient, secure, and optimized containers for your Python applications. By focusing on separating the build and runtime environments, you’ll reduce the size of your final image and enhance security, while also simplifying dependency management. Get ready.
Why are best practices for Docker important?
Using Docker to containerize your application provides consistency across development, testing, and production environments. But without following best practices, you could end up with images that are bloated, insecure, and slow to build or deploy. That’s why it’s important to follow a few key guidelines, especially for production deployments:
- Minimize image size: Smaller images lead to faster downloads and quicker startup times.
- Improve security: Only include what’s absolutely necessary and run your application with limited permissions.
- Optimize for performance: Ensure that your application runs efficiently by reducing unnecessary overhead.
The Power of Multi-Stage Builds
In a multi-stage Dockerfile, you can define separate stages for building and running your application. This allows you to include all the necessary tools for building in one stage and then copy only the essential parts into the final image, leaving out the build tools. Question ? How does it benefits?
- Smaller Image Size: Only the runtime dependencies are included in the final image, which dramatically reduces its size.
- Better Security: Since you discard the build tools and development dependencies, there’s less surface area for potential vulnerabilities.
The Ideal Base Image: Alpine Linux
When it comes to choosing a base image, Alpine Linux is one of the best options for production environments. It is a minimalist Linux distribution designed specifically for security and resource efficiency. A Docker image based on Alpine Linux is significantly smaller than one based on a full-fledged OS like Ubuntu or Debian.
Why Alpine?
- Alpine images are often less than 10MB, compared to hundreds of megabytes for traditional Linux distributions.
- With fewer components, there’s less chance of unpatched software and security flaws.
- Can add only the specific libraries and tools your application needs, keeping the image lean.
While Alpine Linux is the most lightweight and secure option, other base images might suit specific use cases:
- Slim Docker Image: If you prefer a more complete OS with better debugging capabilities and fewer compatibility issues, the docker image tagged with slim could be a better fit. It offers a middle ground between minimalism and convenience, although at the cost of a larger image size.
Ultimately, for most production environments where resource efficiency and security are paramount, Alpine Linux is the preferred choice.
Building a Dockerfile
Let’s walk through an example Dockerfile for a Django application, designed to follow the best practices mentioned. We’ll use a multi-stage build and the Alpine Linux base image.
# Stage 1: Build stage
FROM python:3.12-alpine AS builder
# ensures that the Python interpreter doesn’t generate .pyc
ENV PYTHONDONTWRITEBYTECODE=1
# send python output to the terminal without being buffered in real-time
ENV PYTHONUNBUFFERED=1
# Working directory
WORKDIR /usr/src/app
# Install build dependencies
RUN apk add --no-cache --virtual .build-deps \
gcc musl-dev python3-dev \
# for psycopg2-binary package installation
&& apk add postgresql-dev \
# Install Pillow dependencies and other system libraries
&& apk add jpeg-dev zlib-dev libjpeg \
# Install uWSGI dependencies
&& apk add build-base linux-headers pcre-dev \
# Upgrade pip
&& python3 -m pip install --upgrade pip
# Copy and install Python dependencies
COPY ./requirements.txt /usr/src/app/
RUN pip install --no-cache-dir -r requirements.txt
# Stage 2: Runtime stage
FROM python:3.12-alpine
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Working directory
WORKDIR /usr/src/app
# Define build arguments for user and group names and IDs
ARG USER_NAME=test
ARG GROUP_NAME=test
ARG USER_UID=1000
ARG USER_GID=${USER_UID}
# Install runtime dependencies (excluding build tools)
RUN apk add --no-cache postgresql-libs libjpeg libpng pcre curl bash dos2unix
# Re-create the user in the runtime stage
RUN addgroup -g ${USER_GID} ${GROUP_NAME} && \
adduser -u ${USER_UID} -G ${GROUP_NAME} -S -h /home/${USER_NAME} ${USER_NAME}
# Copy installed Python packages and binaries from the build stage
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
# Copy the application code
COPY --chown=${USER_NAME}:${GROUP_NAME} . .
# Use the non-root user to run the app
USER ${USER_NAME}:${GROUP_NAME}
# Expose the port
EXPOSE 8000
# Set the entry point to your script as per your need
# ENTRYPOINT ["./entrypoint.sh"]
CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]
Let’s break down why this dockerfile follows the best approach:
- Multi-Stage Build: The Dockerfile separates the build environment from the runtime environment. This means you only include the packages and libraries your application needs to run, without carrying along any unnecessary build tools. The result is a significantly smaller and more secure image.
- Alpine Linux Base Image: Using python:3.12-alpine ensures that your base image is lightweight and secure. Since Alpine is minimal by design, it reduces the number of potential vulnerabilities, while also keeping your image size down.
- Non-Root User: Running applications as a root user is a major security risk. This Dockerfile includes a dedicated appuser with limited permissions, which follows the principle of least privilege. In case the application is compromised, the potential damage is limited.
- Environment Variables for Python: Setting PYTHONDONTWRITEBYTECODE=1 prevents Python from writing .pyc files, which can reduce unnecessary disk I/O. PYTHONUNBUFFERED=1 ensures that output is not buffered, providing real-time logs that are essential for debugging and monitoring in production.
- Chaining multiple commands into a single
RUN
command using &&: EveryRUN
,COPY
, orADD
command in a Dockerfile creates a new layer in the resulting image. Docker images are composed of multiple layers, and each layer contributes to the overall size of the image. By chaining multiple commands in oneRUN
block reduces the number of layers, leading to a smaller image size. Tradeoff: Less cache flexibility meaning, if one part of a large combinedRUN
command changes, Docker will have to re-execute the entireRUN
command from scratch, even if other parts haven't changed. - Minimal Dependencies: Only essential libraries (like postgresql-libs and libjpeg) are installed in the runtime stage. Unnecessary development tools like gcc and musl-dev are discarded after the build stage, keeping the image clean and small.
If we compare this approach to a traditional single-stage Dockerfile, the benefits become even clearer. Here’s a quick comparison:
FROM python:3.12-alpine
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
# Application settings
ENV DJANGO_SETTINGS_MODULE=blogsite.settings \
DJANGO_ALLOWED_HOSTS=localhost,127.0.0.1,[::1] \
DJANGO_DEBUG=0 \
DJANGO_SECRET_KEY=""
# Database settings
ENV POSTGRES_DB="" \
POSTGRES_USER="" \
POSTGRES_PASSWORD="" \
POSTGRES_HOST="" \
POSTGRES_PORT=5432
WORKDIR /usr/src/app
RUN apk update \
&& apk add --virtual build-deps gcc python3-dev musl-dev \
# for Pillow installation
&& apk add jpeg-dev zlib-dev libjpeg \
# for psycopg2-binary package installation
&& apk add postgresql-dev \
# for uWSGI package installation
&& apk add build-base linux-headers pcre-dev
# Upgrade pip
RUN python3 -m pip install --upgrade pip setuptools
# Copy and install Python dependencies
COPY ./requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the codebase
COPY . .
EXPOSE 8000
# Set the entry point to your script
# ENTRYPOINT ["./entrypoint.sh"]
CMD ["python", "manage.py", "runserver", "0.0.0.0:8000"]
Major Difference
After building the image using both concept, you can see the difference in size here:
What’s wrong with Dockerfile written for single-stage build?
- The build dependencies remain in the final image, unnecessarily increasing its size.
- Including compilers like
gcc
in the final image provides more opportunities for attackers to exploit vulnerabilities. - The application runs with root privileges, making it more vulnerable to security breaches.
Don’t forget to include a
.dockerignore
file to exclude unnecessary files and directories (e.g.,.git
,__pycache__
,*.pyc
, and*.log
) from the Docker build context. Also, in this Dockerfile example for python application, the development server (python manage.py runserver
) is used. Replace it with Gunicorn or WSGI server that are suitable for production use case. UseENTRYPOINT
command to make migrations and collect static files from application code.
Conclusion
To summarize, the key best practices covered in this blog are:
- Use multi-stage builds to separate the build and runtime environments.
- Choose a lightweight base image, such as Alpine Linux, for smaller and more secure images.
- Run as a non-root user to follow the principle of least privilege.
- Minimize dependencies to keep your images lean and fast.
With these practices, you’ll be well on your way to creating Docker images that are not only production-ready but also robust and efficient for real-world deployment.