Writing Quadlets
The first documented appearance of a Quadlet in my infrastructure is OwnCloud a few years ago. Since then I have moved all of my services to a common Quadlet system that is predictable and easy to manage.
Quadlet File Organization
Podman forces us to use /etc/containers/systemd/ to place the Quadlet files.
All filenames and container names start with the service: forgejo-, adguard-home-.
Every service I deploy has a "cornerstone" container which is suffixed with -app.
A database container usually gets a -db suffix, unless multiple databases are required, then I go for -postgresql, -valkey and so on.
If a service needs multiple containers, I create a pod for them: it exposes only those ports that need to be reachable from the outside. That isolates sensitive internals like databases, which frequently have none or predictable credentials.
Filesystem-backed Volumes
I have decided to keep all configurations under /var/app/.
It is a non-standard directory (unlike /etc/ tree, which could contain anything), meaning I know nothing will write there without me knowing about it.
The tree usually goes something like this:
/var/app/
+-- adguard-home/
| +-- config/
| | +-- AdGuardHome.yaml
| +-- data/
| +-- ...
'-- forgejo/
+-- data/
+-- ...
Quadlet structure
I find it useful to group Quadlet configuration options into logical groups.
Header with basic identification
Header contains configuration that is common for all containers, no matter whether they do or do not publish ports, mount volumes, need environment or other special configuration.
[Container]
# Always include fully-qualified image reference.
# Since not all image maintainers publish versioned tags, I
# frequently go for the `:latest` tag, though pinning the container
# to relevant major version like `:2` should prevent situations where
# the new container crashed needing a configuration change.
Image=docker.io/pancake/pancake:latest
# podman-systemd.unit defaults to adding `systemd-` prefix to the
# containers, which I do not like. I prefer when systemd service does
# not expose its internal implementation of being containerized.
ContainerName=pancake-app
# If there are multiple containers, lock them together.
Pod=pancake.pod
# Ensure logs make sense and they have the timezone synchronized with
# its host.
Timezone=local
# Opt into podman-auto-update.timer to not have to worry about manually
# updating the images.
AutoUpdate=registry
Ports
This is arguably the most visible configuration. Since containers are very commonly servers exposing traffic over HTTP, we have to expose their ports.
I like to create a Podman network separate from the default one, just so I have better control over it.
It does not make a huge difference, but I can enable IPv6, disable DNS and control IP address ranges.
Having control over the subnets is helpful when you want to make sure the container only binds to a specific interface and all of them: systems like Fedora CoreOS do not have firewall configured and container listening on 0.0.0.0 is reachable by anyone who can physically route traffic to that host.
I like to be explicit about the ports: always include the interface and always include transport layer protocol.
[Container]
# Do not use default Podman network.
Network=managed
# Port exposed internally to the reverse proxy.
PublishPort=10.10.10.1:8000:8000/tcp
# Port exposed externally to all interfaces. Useful for non-HTTP
# traffic Caddy cannot deal with.
PublishPort=0.0.0.0:2222:22/tcp
# If a container needs to execute `ping`, it must be granted
# additional capability.
AddCapability=NET_RAW
If a container exists within a pod, it cannot expose any ports, because the pod controls the network namespace. I made up a syntax for documenting the port the container uses internally:
[Container]
#_Port=8000/tcp
Container-specific configuration
This boils down to environment variables, secrets and mounts.
[Container]
# Though it is possible to specify multiple secrets on a single line,
# separating them makes git diffs smaller, and they will not wrap on
# small width terminals.
Environment=FOO=foo
Environment=BAR=bar
Most images expect you to pass in secrets via environment variables, but some support a PASSWORD_PATH-like variables instead, allowing you to pass in a path to a secret instead.
Some containers even force you to pass in secrets this way.
With secrets, you can tie the deployment together very easily: instead of hardcoding a password into two separate Quadlet files, you may set the secret in database container and inject the secret in the application container.
[Container]
# Container ingesting a secret can have it passed in. The service will
# fail to start if it is not set, making it easier to spot mistakes.
Secret=pancake__password
[Service]
# Use systemd hooks to run code before the container is started. Since
# this is set globally, this statement can be done in a different
# container from the one that uses the secret. You can communicate
# passwords of any kind. If you do, your receiving container should set
# `Requires=` on the one that owns that secret.
ExecStartPre=sh -euc "printf 'PASSWORD' | podman secret create --ignore pancake__password"
Volumes
If you are running Debian, SELinux does not apply. If you are running Fedora or derivative and you have SELinux disabled, I hope you are ashamed of yourself.
You may disable SELinux relabeling though, as the docs say: "Do not relabel system files and directories. Relabeling system content might cause other confined services on the machine to fail." The same applies to other files the container does not own, like a photography archive mounted into an image gallery container.
Additional thing you can do to increase the isolation from other services running on the system is to set U option to change the owner of the volume.
If a container does this itself internally, you might see weird errors and crashes that are not documented anywhere on the internet (Ask me how I know... Sigh.), so do this only once you have got the rest of the application working, and you are looking into hardening.
[Container]
SecurityLabelDisable=false
# Bind-mount directories from the outside into the container.
# - For clarity, always include trailing / if the mount is a directory.
# - Use `ro` to mount the volume read-only, `rw` to mount it writable.
# - Use `U` to chown the mounted directory.
# - Use `z` if the volume is shared across containers,
# use `Z` if the volume is used by a single container.
Volume=/var/app/pancake/config/:/pancake/config/:ro,U,Z
Volume=/var/app/pancake/data/:/pancake/data/:rw,Z
Hardening and tweaking
The [Container] directive supports lots of fields: consult podman-systemd.unit(5).
Some containers break when you try to harden their runtime configurations, mostly when they are doing something nasty like generating files that are later executed, or altering their configurations at real time.
[Container]
# Create a tmpfs mount in the container. This may be useful for
# temporary files that shouldn't be written down to the disk.
Tmpfs=/tmp
# Do not allow the container to write within its filesystem.
ReadOnly=true
# Do not allow `execve` system call to gain additional privilegs
# (like setuid).
NoNewPrivileges=true
# Some containers take a long time to shut down, especially if they
# contain multiple services within themselves which they want to flush
# and shut down.
StopTimeout=120
Dependencies
As mentioned above, you should set explicit dependencies between containers.
[Unit]
# If your custom Podman network is a Quadlet, depend on it here.
Requires=podman-managed-network.service
# Depend on all other services your container needs.
Requires=pancake-postgresql.service
Requires=pancake-valkey.service
systemd has great trigger and conditional options for services we can take full advantage of. See systemd.unit(5) for more details.
[Service]
# If the container needs a directory it does not own, like a remote
# resource, configure filesystem-level conditional.
ConditionPathIsDirectory=/mnt/nfs-drive/data/archive/photos/
CPU limits, memory limits, further hardening
[Service]
CPUQuota=30%
MemoryHigh=1G
MemoryMax=2G
Nice=15
Once systemctl daemon-reload generated transient services from your quadlets, you may consult the output of systemctl show pancake-app.service or systemd-analyze security pancake-app.service for possible options to change.
Note that since containers usually bind to HTTP sockets, lots of the options marked as insecure will break them if hardened. It is up to you to evaluate and try out what works and what breaks the use-cases of the container.
Lots of options do not make any sense for a container, as it already provides strong isolation from the host system.
If you want to protect yourself from a possible execution outside of the container, you can set configuration options like ProtectSystem=, ProtectHome= and such.
Consult systemd.exec(5) and output of the commands above if you want to experiment.
To be continued
This post started as "What I learned by writing Quadlets for popular projects", but this post is 10k characters already.