Modus logo

Modus

A language for building Docker/OCI container images

Star

Modus uses logic programming to express interactions among build parameters, specify complex build workflows, automatically parallelise and cache builds, help to reduce image size, and simplify maintenance.

Install Modus Read Tutorial

A Modus program is a set of rules that define how new images are built from existing images by adding filesystem layers. Images and layers are represented using predicates, which may have parameters. When building an image, Modus automatically resolves parameter values and computes a build DAG containing all the instructions necessary to construct the image. Modus automatically caches these instructions and executes them in parallel.

This Modusfile defines the image my_app with the parameter profile. Depending on the value of profile, it builds either a debug or a release binary. The operators ::set_workdir and ::set_entrypoint set image properties:


my_app(profile) :-
  (
    from("rust:alpine")::set_workdir("/usr/src/app"),
    copy(".", "."),
    cargo_build(profile)
  )::set_entrypoint(f"./target/${profile}/my_app").
cargo_build("debug") :- run("cargo build").
cargo_build("release") :- run("cargo build --release").

Dockerfiles vs Modus

Dockerfiles Modus
Parameter Interaction Do not track dependencies among build parameters. Tracks and automatically resolves dependencies among build parameters.
Parallelisation Support custom workflows only by resorting to scripts which inefficiently parallelise. Aggressively parallelises builds involving custom logic.
Caching Ineffectively cache custom workflows expressed as embedded scripts. Provides effective caching for custom workflows.
Image size Tend to produce both redundant layers and layers with more files and packages than required. Avoids redundancies via its precise dependencies encoding and permits merging unnecessary layers.
Maintainability Rely on hard-coded configuration and lack code reuse, so they're hard to maintain. Provides zero-cost modularity and code reuse, so Modusfiles are easy to maintain.

Build Parameter Dependencies

Container images are intrinsically parameterised, e.g. python:3.9-slim-bullseye is parameterised with Python's version 3.9 and Debian's options slim and bullseye. These parameters can, and often do, depend on and interact with each other and these interactions determine how images are built. Dockerfiles only support parameters as global variables, and do not handle dependencies between them. Developers either hard-code version dependencies or implement ad-hoc Dockerfile generators. For example, Official OpenJDK Docker Images use a combination of Dockerfiles templates with embedded JQ queries, AWK scripts and Bash scripts to support parametrisation.

Modus capitalises on its logic programming foundation to handle parameters and their dependencies in an intuitive, declarative fashion. Modus decreased the size of OpenJDK Docker images build system by 47.6% from scripts written in three languages to a single Modusfile, while reducing the build time by 40.6%.

OpenJDK images case study

This fragment of OpenJDK Dockerfile template combines Dockerfile with two external tools: (1) {{ syntax handled by an AWK script and (2) predicates expressed as JQ queries:


FROM {{
    if is_debian_slim then
        "debian:" + debian_suite + "-slim"
    else
        "buildpack-deps:" + debian_suite + (
        if env.javaType == "jdk" then
            "-scm"
        else
            "-curl"
        end
    )
    end
}}
              

An equivalent Modusfile expresses this fragment without external tools:


debian_image(VARIANT, JAVA_TYPE) :-
    (
        is_debian_slim(VARIANT, DEBIAN_SUITE),
        from(f"debian:${DEBIAN_SUITE}-slim")
    ;
        is_debian(VARIANT),
        debian_suffix_type(SUFFIX, JAVA_TYPE),
        from(f"buildpack-deps:${VARIANT}${SUFFIX}")
    ).
debian_suffix_type("-scm", "jdk").
debian_suffix_type("-curl", "jre").
            

Parallel Builds

Building container images is time-consuming. A single Dockerfile effectively captures a linear workflow and can be effectively paralellised. Custom workflows, however, require augmenting Dockerfiles with templates and scripts and the combination can be hard to parallelise. For example, the OpenJDK Dockerfile template builds OpenJDK images 40.6% slower than Modus, because the templating scripts run sequentially. Alternative solutions, such as Buildah scripts, are also difficult to automatically parallelise.

Modus statically constructs the build graph consisting of all required operations to build target images, which enables it to aggressively paralellise build with BuildKit. When building multiple images in parallel, Modus reuses shared layers across them.

The query openjdk(version, "jdk", variant), number_gt(version, 11) for OpenJDK Modusfile builds all variants of all JDK versions greater than 11. Modus builds 23 images in parallel, reusing intermediate layers across images:


$ modus build . 'openjdk(version, "jdk", variant), number_gt(version, 11)'
Exporting 1/23: openjdk("17", "jdk", "bullseye") ->
sha256:220611111e8c9bbe242e9dc1367c0fa89eef83f26203ee3f7c3764046e02b248
Exporting 2/23: openjdk("17", "jdk", "slim-buster") ->
sha256:c34ce3c1fcc0c7431e1392cc3abd0dfe2192ffea1898d5250f199d3ac8d8720f
...
Exporting 23/23: openjdk("18", "jdk", "buster") ->
sha256:220611111e8c9bbe242e9dc1367c0fa89eef83f26203ee3f7c3764046e02b248
            

Caching

Caching is crucial for optimising build time, because it reuses previously built layers when there are no changes in the build environment and configuration. Caching is only effective when the build system correctly detects cache invalidation. It is a common practice to implement custom build logic by embedding scripts into Dockerfile's RUN instructions. Embedded scripts make cache invalidation imprecise, since any change of the script invalidates the cache even if it is irrelevant to the build target.

Modus statically computes the exact sequence of instructions required to build the target image. This enables users to describe custom logic without sacrificing automatic caching, since Modus does not invalidate cache when irrelevant parts of build logic are modified.

In this Dockerfile, the build profile is selected by an embedded shell script using the argument PROFILE. The build cache for the target PROFILE=debug will be invalidated even if only the irrelevant line make program ; \ is modified:


FROM gcc:bullseye AS app
COPY program.c program.c
ARG PROFILE
RUN if [ "$PROFILE" = "debug" ] ; then \
      CFLAGS=-g make -e program ; \
    else \
      make program ; \
    fi
            

In the equivalent Modusfile, cache invalidation depends on the exact executed commands, therefore a change of the body of make("release") will not invalidate the cache of app("debug"):


app(profile) :-
    from("gcc:bullseye"),
    copy("program.c", "program.c"),
    make(profile).
make("debug") :- run("make -e program")::in_env("CFLAGS", "-g").
make("release") :- run("make program").
            

Optimising Image Size

Container images often include redundant layers, files and installed packages, which greatly increases their size, slows down their transfer through network, and compromises security by increasing the attack surface. At Stackoverflow, the question Why are Docker container images so large? has 103k views. Dockerfiles cannot conditionally install packages and copy files based on the build configuration without sacrificing caching and parallelism, and do not provide tool for fine-grained control of layers.

Modus provides predicates and operators for querying and modifying the build environment. Together, they allow the user to precisely define the files and software packages their build configuration requires. Modus provides constructs to reduce image size such as the operator ::merge that merges several layers into one, and the operator ::copy for conveniently defining multi-stage builds.

The operator ::merge is applied to a fragment of code to ensure that it will produce a single layers. As a result, the directory src will not be stored in an intermediate layer:


app(build_mode) :-
    from("gcc:latest"),
    (
        copy("src", "src"),
        make(build_mode),
        run("rm -rf src")
    )::merge.
make("release") :- run("cd src; make install").
make("debug") :- run("cd src; make -e install")::in_env("CFLAGS", "-g").

The operator ::copy is applied to copy a file converted to UNIX format from a temporary image, without requiring installation of the package dos2unix on the target image app:


copy_convert(file, dest) :-
    (
        from("debian:bullseye-slim"),
        run("apt-get update && apt-get install dos2unix"),
        copy(file, f"/tmp/${file}"),
        run(f"dos2unix /tmp/${file}")
    )::copy(f"/tmp/${file}", dest).
app :-
    from("debian:bullseye-slim"),
    copy_convert("my_local_script.sh", ".").

Modularity & Code Reuse

Just like any code, container build definitions evolve and require maintenance. Dockerfiles do not provide features for modularity and code reuse. Besides, to optimise image size, they require structuring code in an way that many dislike.

Modus supports code evolution and maintenance by providing zero-cost modularity and code reuse. Modus allows users to define their own commands, such as layer building functions or logical predicates, to abstract reusable build workflows. Modus provides a library of builtin predicates to handle common data structures in the build logic. For example, the predicate semver_geq checks if the left version is greater or equal to the right version according to SemVer specification.

List of builtin predicates

Modus provides a library of operators, such as ::in_env for executing commands in a custom environment, that encapsulate build-specific instructions and manipulation of OCI image properties.

List of operators

Using a user-defined predicate install to reuse library installation code:


install(lib, version) :-
    run(f"wget https://example.com/libs/${lib}-v${version}.tar.gz && \
          tar xf ${lib}-v${version}.tar.gz && \
          mv ${lib}-v${version}/ /build"),
    run("cd /build && make install"),
    run(f"rm ${lib}-v${version}.tar.gz && \
          rm -rf /build").

app :-
    from("gcc:latest"),
    install("liba", "1.3.5"),
    install("libb", "4.1").

Using the built-in predicate semver_geq to compare versions of Ubuntu:


base(distr_version, python_version) :-
    semver_geq(distr_version, "16.04"),
    from(f"ubuntu:${distr_version}"),
    run(f"apt-get update && apt-get install -y python${python_version} \
          && rm -rf /var/lib/apt/lists/*").
            

Using the built-in operator ::in_env to execute commands in a custom environment:


app :-
    from("debian:bullseye-slim"),
    (
        run("apt-get update"),
        run("apt-get upgrade"),
        run("apt-get install build-essential")
    )::in_env("DEBIAN_FRONTEND", "noninteractive").

Research & Development

Modus is a research project developed at University College London. Modus is led by Dr. Sergey Mechtaev and Prof. Earl T. Barr. Modus aims to be a practical tool that is used by both computer science researchers and software developers.

Research on Modus is published in peer-reviewed venues, and follows the principles of open science. All our code and data are released as reproducible packages on GitHub.

For more details, please read our FSE'22 paper on Modus:

Modus: A Datalog Dialect for Building Container Images
Chris Tomy, Tingmao Wang, Earl Barr, Sergey Mechtaev
The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)