Why using Alpine Docker images and Python is probably bad for your project (right now)
By Ryan Pepper
Alpine Linux is a distribution that is designed to be lightweight. In particular, it’s seen a lot of use in Docker images because the resulting image bundles are considerably smaller than those generated by other minimal distros. However, in the context of building a Docker image for a Python application, it’s worth thinking carefully before using Alpine, as it can often result in slower builds and counterintuitively it can even result in larger images occasionally.
In order to understand why this is, we need to think about what a typical Python module looks like. Of course, there is usually a bundle of Python source files. But it is almost always the case that your Python library has additional dependencies, and this is normally where the issue stems. A typical Python application might have a requirements.txt specifying one or more of the following very common dependencies:
pandas~=1.5.2
numpy~=1.23.0
Django~=4.1.4
What do all of these have in common? They (or their dependencies) are not all pure Python. We can show this by building a Docker image which installs the dependencies with the following Dockerfile:
FROM python:3.10.8-bullseye
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r requirements.txt
If we build it, you’ll see some interesting things in the reduced output if you look carefully:
$ docker build -f Dockerfile.ubuntu . -t test:ubuntu
Sending build context to Docker daemon 180.1MB
Step 1/3 : FROM python:3.10.8-bullseye
---> 465483cdaa4e
Step 2/3 : COPY requirements.txt /tmp/requirements.txt
---> bb0c136ebbd2
Step 3/3 : RUN pip install -r /tmp/requirements.txt
---> Running in 0a75aa6ac421
Collecting pandas~=1.5.2
Downloading pandas-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.1/12.1 MB 4.0 MB/s eta 0:00:00
Collecting numpy~=1.23.0
Downloading numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 3.9 MB/s eta 0:00:00
Collecting Django~=4.1.4
Downloading Django-4.1.4-py3-none-any.whl (8.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.1/8.1 MB 3.8 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.1
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 kB 4.1 MB/s eta 0:00:00
Collecting pytz>=2020.1
Downloading pytz-2022.6-py2.py3-none-any.whl (498 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 498.1/498.1 kB 3.9 MB/s eta 0:00:00
Collecting sqlparse>=0.2.2
Downloading sqlparse-0.4.3-py3-none-any.whl (42 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 kB 2.8 MB/s eta 0:00:00
Collecting asgiref<4,>=3.5.2
Downloading asgiref-3.5.2-py3-none-any.whl (22 kB)
Collecting six>=1.5
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, sqlparse, six, numpy, asgiref, python-dateutil, Django, pandas
Successfully installed Django-4.1.4 asgiref-3.5.2 numpy-1.23.5 pandas-1.5.2 python-dateutil-2.8.2 pytz-2022.6 six-1.16.0 sqlparse-0.4.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It
is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
Removing intermediate container 0a75aa6ac421
---> cb5453eaac47
Successfully built cb5453eaac47
The main thing of note is that when downloading the dependencies, generally, pip
will default to installing a wheel
distribution of the package if it is available. The wheels that it tries to download here have a suffix manylinux_2_17_x86_64.manylinux2014_x86_64.whl
while others have a none-any.whl
suffix. These suffixes indicate whether the downloaded package is a platform specific version of a Python package or not. For a general Python package containing only Python files (i.e. files with a .py
extension), we can generally distribute those files and they will work straight out of the box, and these wheels have the none-any.whl
ending. But many Python libraries also depend on extensions or libraries written in C/C++/etc. which cannot generally run anywhere because of dependencies on system libraries, most commonly the C standard library glibc which is forwards but not backwards compatible. The manylinux
PEP513 tag was started in order to allow providing users with the ability to install wheels which are distribution and architecture specific, rather than forcing package maintainers to include compiled dependencies for all possible permutations of system that a potential user might use.
You might be asking now what relevance this has for Alpine Linux. Alpine does not, unlike most Linux distributions, use glibc as a shared library for compiled C dependencies. Instead, it uses the MUSL library, which is very lightweight but must be statically linked into every compiled binary. A platform tag of musllinux
was added in PEP656 exists to signal wheels that support it. Unfortunately for Alpine fans, there is currently very poor support for Alpine wheels on the PyPi package repository for most major packages, and because of this, almost all Python dependencies with C extensions must therefore be compiled from the source distribution. Indeed, here is a working copy of an Alpine Dockerfile for the same dependencies:
FROM python:3.10.8-alpine3.17
COPY requirements.txt /tmp
RUN apk --no-cache add musl-dev linux-headers g++ && \
pip install -r /tmp/requirements.txt && \
apk delete musl-dev linux-headers g++
You can see that I have to install a compiler, then install the dependencies, and then remove the compiler to reduce the image size. In the interest of brevity, I won’t include the full build log, but you can see a representative snippet of the output here:
Building wheels for collected packages: pandas, numpy
Building wheel for pandas (pyproject.toml): started
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
Building wheel for pandas (pyproject.toml): still running...
In terms of build time and size for the final image, what sort of impact does this have? On my machine, with the Python images cached locally (as they would be on most systems where you’re doing this sort of build) before starting the build, I get the following results:
Base Image Tag | Build time | Base Image Size (MB) | Image size (MB) |
---|---|---|---|
3.10.8-bullseye | 00:00:26 | 921.1 | 1118.7 |
3.10.8-alpine3.17 | 00:23:40 | 50.0 | 194.5 |
So you can see that while it clearly wins on build size, the build time is a whopping 54x slower in my totally unscientific test. In time, I’d expect that more and more packages will add MUSL support in their build pipelines. But whether it will reach a critical mass is questionable - it’s taken a long time to add wheels that support ARM architectures for most major packages, and the change required here is pretty similar.
Caveats
Critics might say that I’ve ommitted a few things here which I’ll try and address:
“We could pull the Python packages from a local mirror of PyPi where we have precompiled all of the needed wheels”
I mean, yes, but that’s another thing to set up and maintain.
“We could install the Python packages from apk rather than from PyPi”
Again, yes, but that’s no dice if you want a version that is not available from the Alpine package repository, and the only versions of the Python packages I’ve demoed here are only in the ‘community’ repository which could introduce a risk that in a corporate environment might not be one you’d want to take.