Modus uses logic programming to express interactions among build parameters, specify complex build workflows, automatically parallelise and cache builds, help to reduce image size, and simplify maintenance.
A Modus program is a set of rules that define how new images are built from existing images by adding filesystem layers. Images and layers are represented using predicates, which may have parameters. When building an image, Modus automatically resolves parameter values and computes a build DAG containing all the instructions necessary to construct the image. Modus automatically caches these instructions and executes them in parallel.
This Modusfile defines the image my_app
with the parameter profile
. Depending on the value of profile
, it builds either a debug or a release binary. The operators ::set_workdir
and ::set_entrypoint
set image properties:
my_app(profile) :-
(
from("rust:alpine")::set_workdir("/usr/src/app"),
copy(".", "."),
cargo_build(profile)
)::set_entrypoint(f"./target/${profile}/my_app").
cargo_build("debug") :- run("cargo build").
cargo_build("release") :- run("cargo build --release").
Dockerfiles | Modus | |
---|---|---|
Parameter Interaction | Do not track dependencies among build parameters. | Tracks and automatically resolves dependencies among build parameters. |
Parallelisation | Support custom workflows only by resorting to scripts which inefficiently parallelise. | Aggressively parallelises builds involving custom logic. |
Caching | Ineffectively cache custom workflows expressed as embedded scripts. | Provides effective caching for custom workflows. |
Image size | Tend to produce both redundant layers and layers with more files and packages than required. | Avoids redundancies via its precise dependencies encoding and permits merging unnecessary layers. |
Maintainability | Rely on hard-coded configuration and lack code reuse, so they're hard to maintain. | Provides zero-cost modularity and code reuse, so Modusfiles are easy to maintain. |
Container images are intrinsically parameterised, e.g. python:3.9-slim-bullseye
is parameterised with Python's version 3.9
and Debian's options slim
and bullseye
. These parameters can, and often do, depend on and interact with each other and these interactions determine how images are built. Dockerfiles only support parameters as global variables, and do not handle dependencies between them. Developers either hard-code version dependencies or implement ad-hoc Dockerfile generators. For example, Official OpenJDK Docker Images use a combination of Dockerfiles templates with embedded JQ queries, AWK scripts and Bash scripts to support parametrisation.
Modus capitalises on its logic programming foundation to handle parameters and their dependencies in an intuitive, declarative fashion. Modus decreased the size of OpenJDK Docker images build system by 47.6% from scripts written in three languages to a single Modusfile, while reducing the build time by 40.6%.
This fragment of OpenJDK Dockerfile template combines Dockerfile with two external tools: (1) {{
syntax handled by an AWK script and (2) predicates expressed as JQ queries:
FROM {{
if is_debian_slim then
"debian:" + debian_suite + "-slim"
else
"buildpack-deps:" + debian_suite + (
if env.javaType == "jdk" then
"-scm"
else
"-curl"
end
)
end
}}
An equivalent Modusfile expresses this fragment without external tools:
debian_image(VARIANT, JAVA_TYPE) :-
(
is_debian_slim(VARIANT, DEBIAN_SUITE),
from(f"debian:${DEBIAN_SUITE}-slim")
;
is_debian(VARIANT),
debian_suffix_type(SUFFIX, JAVA_TYPE),
from(f"buildpack-deps:${VARIANT}${SUFFIX}")
).
debian_suffix_type("-scm", "jdk").
debian_suffix_type("-curl", "jre").
Building container images is time-consuming. A single Dockerfile effectively captures a linear workflow and can be effectively paralellised. Custom workflows, however, require augmenting Dockerfiles with templates and scripts and the combination can be hard to parallelise. For example, the OpenJDK Dockerfile template builds OpenJDK images 40.6% slower than Modus, because the templating scripts run sequentially. Alternative solutions, such as Buildah scripts, are also difficult to automatically parallelise.
Modus statically constructs the build graph consisting of all required operations to build target images, which enables it to aggressively paralellise build with BuildKit. When building multiple images in parallel, Modus reuses shared layers across them.
The query openjdk(version, "jdk", variant), number_gt(version, 11)
for OpenJDK Modusfile builds all variants of all JDK versions greater than 11. Modus builds 23 images in parallel, reusing intermediate layers across images:
$ modus build . 'openjdk(version, "jdk", variant), number_gt(version, 11)'
Exporting 1/23: openjdk("17", "jdk", "bullseye") ->
sha256:220611111e8c9bbe242e9dc1367c0fa89eef83f26203ee3f7c3764046e02b248
Exporting 2/23: openjdk("17", "jdk", "slim-buster") ->
sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f
...
Exporting 23/23: openjdk("18", "jdk", "buster") ->
sha256:220611111e8c9bbe242e9dc1367c0fa89eef83f26203ee3f7c3764046e02b248
Caching is crucial for optimising build time, because it reuses previously built layers when there are no changes in the build environment and configuration. Caching is only effective when the build system correctly detects cache invalidation. It is a common practice to implement custom build logic by embedding scripts into Dockerfile's RUN
instructions. Embedded scripts make cache invalidation imprecise, since any change of the script invalidates the cache even if it is irrelevant to the build target.
Modus statically computes the exact sequence of instructions required to build the target image. This enables users to describe custom logic without sacrificing automatic caching, since Modus does not invalidate cache when irrelevant parts of build logic are modified.
In this Dockerfile, the build profile is selected by an embedded shell script using the argument PROFILE
. The build cache for the target PROFILE=debug
will be invalidated even if only the irrelevant line make program ; \
is modified:
FROM gcc:bullseye AS app
COPY program.c program.c
ARG PROFILE
RUN if [ "$PROFILE" = "debug" ] ; then \
CFLAGS=-g make -e program ; \
else \
make program ; \
fi
In the equivalent Modusfile, cache invalidation depends on the exact executed commands, therefore a change of the body of make("release")
will not invalidate the cache of app("debug")
:
app(profile) :-
from("gcc:bullseye"),
copy("program.c", "program.c"),
make(profile).
make("debug") :- run("make -e program")::in_env("CFLAGS", "-g").
make("release") :- run("make program").
Container images often include redundant layers, files and installed packages, which greatly increases their size, slows down their transfer through network, and compromises security by increasing the attack surface. At Stackoverflow, the question Why are Docker container images so large? has 103k views. Dockerfiles cannot conditionally install packages and copy files based on the build configuration without sacrificing caching and parallelism, and do not provide tool for fine-grained control of layers.
Modus provides predicates and operators for querying and modifying the build environment. Together, they allow the user to precisely define the files and software packages their build configuration requires. Modus provides constructs to reduce image size such as the operator ::merge
that merges several layers into one, and the operator ::copy
for conveniently defining multi-stage builds.
The operator ::merge
is applied to a fragment of code to ensure that it will produce a single layers. As a result, the directory src
will not be stored in an intermediate layer:
app(build_mode) :-
from("gcc:latest"),
(
copy("src", "src"),
make(build_mode),
run("rm -rf src")
)::merge.
make("release") :- run("cd src; make install").
make("debug") :- run("cd src; make -e install")::in_env("CFLAGS", "-g").
The operator ::copy
is applied to copy a file converted to UNIX format from a temporary image, without requiring installation of the package dos2unix
on the target image app
:
copy_convert(file, dest) :-
(
from("debian:bullseye-slim"),
run("apt-get update && apt-get install dos2unix"),
copy(file, f"/tmp/${file}"),
run(f"dos2unix /tmp/${file}")
)::copy(f"/tmp/${file}", dest).
app :-
from("debian:bullseye-slim"),
copy_convert("my_local_script.sh", ".").
Just like any code, container build definitions evolve and require maintenance. Dockerfiles do not provide features for modularity and code reuse. Besides, to optimise image size, they require structuring code in an way that many dislike.
Modus supports code evolution and maintenance by providing zero-cost modularity and code reuse. Modus allows users to define their own commands, such as layer building functions or logical predicates, to abstract reusable build workflows. Modus provides a library of builtin predicates to handle common data structures in the build logic. For example, the predicate semver_geq
checks if the left version is greater or equal to the right version according to SemVer specification.
Modus provides a library of operators, such as ::in_env
for executing commands in a custom environment, that encapsulate build-specific instructions and manipulation of OCI image properties.
Using a user-defined predicate install
to reuse library installation code:
install(lib, version) :-
run(f"wget https://example.com/libs/${lib}-v${version}.tar.gz && \
tar xf ${lib}-v${version}.tar.gz && \
mv ${lib}-v${version}/ /build"),
run("cd /build && make install"),
run(f"rm ${lib}-v${version}.tar.gz && \
rm -rf /build").
app :-
from("gcc:latest"),
install("liba", "1.3.5"),
install("libb", "4.1").
Using the built-in predicate semver_geq
to compare versions of Ubuntu:
base(distr_version, python_version) :-
semver_geq(distr_version, "16.04"),
from(f"ubuntu:${distr_version}"),
run(f"apt-get update && apt-get install -y python${python_version} \
&& rm -rf /var/lib/apt/lists/*").
Using the built-in operator ::in_env
to execute commands in a custom environment:
app :-
from("debian:bullseye-slim"),
(
run("apt-get update"),
run("apt-get upgrade"),
run("apt-get install build-essential")
)::in_env("DEBIAN_FRONTEND", "noninteractive").
Modus is a research project developed at University College London. Modus is led by Dr. Sergey Mechtaev and Prof. Earl T. Barr. Modus aims to be a practical tool that is used by both computer science researchers and software developers.
Research on Modus is published in peer-reviewed venues, and follows the principles of open science. All our code and data are released as reproducible packages on GitHub.
For more details, please read our FSE'22 paper on Modus:
Modus: A Datalog Dialect for Building Container Images