Dear list,
I have a question about the intended use of the RuntimeDirectory
directive regarding the use for, what I'd call, ephemeral chroot
environments. Also I would like some clarification about
RestrictAddressFamilies, but that is only related in that it happened to
come up when hardening a socket-activated service unit, so if it is
better to handle that in a separate thread, just tell me and I will
create one.
A bit of context:
I only recently found out about `systemd-analyze security` and used it
to harden `unbound.service` which comes with Ubuntu 24.04 LTS. And while
I was at it I came up with a way to get a chroot environment whose
lifetime is limited to the runtime of the service itself, like so:
[Service]
DynamicUser=yes
# Only confine ExecStart=, so we need neither Bind* unbound-control nor
the unix socket
RootDirectoryStartOnly=yes
# Use systemd's chroot capabilities
RootDirectory=%t/%N
# put it in a runtime directory so it leaves no trace after exit and
need not exist
# this should be the same path as above
RuntimeDirectory=%N
# But it is not an actual runtime directory meant to be writeable
UnsetEnvironment=RUNTIME_DIRECTORY
ReadOnlyPaths=+%t/%N
For reference see the attached full output of
`systemctl cat unbound.service` in the attachments. I hope the comments
carry the intent. The idea is to have the chroot created when the
service starts and destroyed when it exits. There seems to be no other
way to do this than to (ab)use `RuntimeDirectory=` as shown above. I've
tried this with `TeporaryFileSystem` but that requires that the
directory already exists.
So my main concern is if there are any side effects I may not be aware
of that might come back to bite me. So far this unit runs like a charm
with the least privileges I could manage to get working.
`systemd-analyze security` shows an exposure level of "1.1 OK", so
that's pretty good. And thanks by the way for this tool, it is great for
finding things I, as well as upstream and/or Canonical, was not even
aware of. I am even considering sending these as enhancements to one or
both of the latter. But that depends on what you, the experts, tell me.
My rationale for doing this is basically that this way the unit is
basically self-contained in a way, so there is nothing that needs to
happen outside to set up it's runtime environment.
The `UnsetEnvironment=RUNTIME_DIRECTORY` and
`ReadOnlyPaths=+%t/%N` may be unnecessary since I don't expect unbound
to be using those anyway, but then again this is about hardening and a
compromised service may be able to make use of this, was my thinking
there. I was unable, however, to actually see the contents of `+%t/%N`,
if any, since I wanted to know if there is a loop situation, since that
would point to the RootDirectory. I am not quite sure why I could not
see anything in there, though, but am suspecting that this has to do
with it being `unshare`d. Additional info on this would be very welcome
too, even if it's just an RTFM pointer; the documentation is kind of
overwhelming, after all.
I hope this covers my main question.
As for the `RestrictAdressFamilies` directive, I want to know if it is
even possible to restrict AF_{NETLINK,UNIX,INET,INET6} when a service is
socket-activated. I somehow got the idea in my head that the service
executable should not need to do any binding itself since that should
have happened already by starting the corresponding socket unit. But
this seems impossible with unbound. Or am I misunderstanding how this
works? Is that at least theoretically possible? I am half suspecting
that it is but that unbound does not support this since
systemd-integration seems to be an afterthought. So if this is only
because of some missing integration on unbound's part I would like to
get that upstreamed, because then I could check three more boxes in
`systemd-analyze security`.
For reference I am attaching the output `systemctl cat
unbound.{service,socket}` and `systemd-analyze security unbound.service`
`systemd --version`:
systemd 255 (255.4-1ubuntu8.5) +PAM +AUDIT +SELINUX +APPARMOR +IMA
+SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS
+FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY
+P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK
-XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
Thanks for your time and, of course, systemd!
Peter
# /etc/systemd/system/unbound.socket
# /etc/systemd/system/unbound.service
[Unit]
Description=Socket(s) (including control) for Unbound DNS server
Documentation=man:unbound(8)
DefaultDependencies=no
After=systemd-sysusers.service
Requires=sysinit.target
Conflicts=shutdown.target
#Before=systemd-resolved.service sysinit.target network.target
nss-lookup.target shutdown.target
Before=systemd-resolved.service nss-lookup.target shutdown.target
[Socket]
ListenDatagram=127.0.0.1:53
ListenStream=127.0.0.1:53
ListenStream=%t/unbound-control/%N.ctl
SocketGroup=%N
SocketMode=0660
[Install]
WantedBy=sockets.target
# /etc/systemd/system/unbound.service
[Unit]
Description=Unbound DNS server
Documentation=man:unbound(8)
After=network.target
Before=nss-lookup.target
Wants=nss-lookup.target
[Service]
Type=notify
Restart=on-failure
EnvironmentFile=-/etc/default/unbound
ExecStartPre=-/usr/libexec/unbound-helper chroot_setup
ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update
ExecStart=/usr/sbin/unbound -d -p $DAEMON_OPTS
ExecStopPost=-/usr/libexec/unbound-helper chroot_teardown
ExecReload=+/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/unbound.service.d/override-hardening.conf
# /etc/systemd/system/unbound.service.d/override.conf
# This overrides the service unit that comes with Ubuntu in an effort to
# maximize security.
[Unit]
# For verifying keys, time should be synced first
After=time-sync.target
[Service]
# do not run chroot helper
ExecStartPre=
# This is done by external service unit, see additional *-hardening.conf
# ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update
ExecStart=
# Don't daemonize and send output to stderr for journald
ExecStart=unbound -ddp $DAEMON_OPTS
# No daemonizing and hence no PID file neccessary, see -d and -p in unbound(8)
PIDFile=
# No chroot teardown helper necessary
ExecStopPost=
ExecReload=
# This at least waits for a reply in contrast to just sending HUP
ExecReload=unbound-control -q reload
[Install]
# Don't install any dependencies since we rely solely on socket activation
WantedBy=
Also=%N.socket
# /etc/systemd/system/unbound.service.d/zz-override-hardening.conf
# This overrides the service unit that comes with Ubuntu in an effort to
# maximize security.
[Unit]
# Use external bootstrapping b/c this one is too restrictive
Requires=%N-trust-anchor-update.service
After=%N-trust-anchor-update.service
# Double check if it worked
AssertPathExists=%S/%N/root.key
# Hard dependency on socket because we have no privileges
BindsTo=%N.socket
#After=<unnecessary b/c sockets have implicit Before=>
[Service]
DynamicUser=yes
# Only confine ExecStart=, so we need neither Bind* unbound-control nor the
unix socket
RootDirectoryStartOnly=yes
# Use systemd's chroot capabilities
RootDirectory=%t/%N
# put it in a runtime directory so it leaves no trace after exit and need not
exist
# this should be the same path as above
RuntimeDirectory=%N
# But it is not an actual runtime directory meant to be writeable
UnsetEnvironment=RUNTIME_DIRECTORY
ReadOnlyPaths=+%t/%N
# root.key lives in StateDirectory, i.e. /var/lib/unbound or
/var/lib/private/unbound, in
# case of running as dynamic user
StateDirectory=%N
ConfigurationDirectory=%N
## Binaries
# need unbound-control for reload action and the control socket
#BindReadOnlyPaths=/usr/sbin/unbound /usr/sbin/unbound-control
%t/%N-control/%N.ctl
BindReadOnlyPaths=/usr/sbin/unbound
# For some reason unbound needs access to this too
# lest it complain about systemd not running?
BindReadOnlyPaths=%t/systemd/system
BindReadOnlyPaths=/etc/ssl/certs/ca-certificates.crt
## required shared objects
# linker
BindReadOnlyPaths=/lib64/ld-linux-x86-64.so.2
# printf 'BindReadOnlyPaths=%s\n' $(ldd $(command which unbound) |
# sed -nE 's/.* => (.*) \([^)]*\)$/\1/p'
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libc.so.6
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libcap.so.2
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libcrypto.so.3
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libevent-2.1.so.7
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libexpat.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libgcrypt.so.20
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libgpg-error.so.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libhiredis.so.1.1.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/liblz4.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/liblzma.so.5
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libm.so.6
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libnghttp2.so.14
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libprotobuf-c.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libpython3.12.so.1.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libssl.so.3
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libsystemd.so.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libz.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libzstd.so.1
ProtectClock=yes
# Implied by DynamicUser=yes
ProtectSystem=strict
ProtectHome=yes
PrivateDevices=yes
ProtectKernelTunables=yes
ProtectControlGroups=yes
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK
CapabilityBoundingSet=
SystemCallFilter=~@clock @cpu-emulation @debug @module @mount @obsolete
@privileged @raw-io @reboot @resources @swap
#SystemCallFilter=~@clock @cpu-emulation @debug @module @mount @obsolete
@raw-io @reboot @resources @swap
SystemCallArchitectures=native
# only necessary without socket activation and unbound doing chroot
#CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SETUID CAP_SETGID CAP_CHOWN
LockPersonality=yes
NoNewPrivileges=yes
PrivateUsers=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectProc=invisible
MountAPIVFS=yes
ProcSubset=pid
MemoryDenyWriteExecute=yes
RestrictRealtime=yes
RestrictNamespaces=yes
UMask=077
[Install]
# Don't install any dependencies since we rely solely on socket activation
WantedBy=
NAME DESCRIPTION
EXPOSURE
✓ SystemCallFilter=~@swap System call deny
list defined for service, and @swap is included
✓ SystemCallFilter=~@resources System call deny
list defined for service, and @resources is included
✓ SystemCallFilter=~@reboot System call deny
list defined for service, and @reboot is included
✓ SystemCallFilter=~@raw-io System call deny
list defined for service, and @raw-io is included
✓ SystemCallFilter=~@privileged System call deny
list defined for service, and @privileged is included
✓ SystemCallFilter=~@obsolete System call deny
list defined for service, and @obsolete is included
✓ SystemCallFilter=~@mount System call deny
list defined for service, and @mount is included
✓ SystemCallFilter=~@module System call deny
list defined for service, and @module is included
✓ SystemCallFilter=~@debug System call deny
list defined for service, and @debug is included
✓ SystemCallFilter=~@cpu-emulation System call deny
list defined for service, and @cpu-emulation is included
✓ SystemCallFilter=~@clock System call deny
list defined for service, and @clock is included
✓ RemoveIPC= Service user
cannot leave SysV IPC objects around
✓ User=/DynamicUser= Service runs
under a transient non-root user identity
✓ RestrictRealtime= Service realtime
scheduling access is restricted
✓ CapabilityBoundingSet=~CAP_SYS_TIME Service processes
cannot change the system clock
✓ NoNewPrivileges= Service processes
cannot acquire new privileges
✓ AmbientCapabilities= Service process
does not receive ambient capabilities
✓ CapabilityBoundingSet=~CAP_BPF Service may load
BPF programs
✓ SystemCallArchitectures= Service may
execute system calls only with native ABI
✗ RestrictAddressFamilies=~AF_NETLINK Service may
allocate netlink sockets 0.1
✗ RestrictAddressFamilies=~AF_UNIX Service may
allocate local sockets 0.1
✗ RestrictAddressFamilies=~AF_(INET|INET6) Service may
allocate Internet sockets 0.3
✓ ProtectSystem= Service has
strict read-only access to the OS file hierarchy
✓ ProtectProc= Service has
restricted access to process tree (/proc hidepid=)
✓ SupplementaryGroups= Service has no
supplementary groups
✓ CapabilityBoundingSet=~CAP_SYS_RAWIO Service has no
raw I/O access
✓ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has no
ptrace() debugging abilities
✓ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE) Service has no
privileges to change resource use parameters
✓ CapabilityBoundingSet=~CAP_NET_ADMIN Service has no
network configuration privileges
✓ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has no
elevated networking privileges
✓ CapabilityBoundingSet=~CAP_AUDIT_* Service has no
audit subsystem access
✓ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has no
administrator privileges
✓ PrivateTmp= Service has no
access to other software's temporary files
✓ ProcSubset= Service has no
access to non-process /proc files (/proc subset=)
✓ CapabilityBoundingSet=~CAP_SYSLOG Service has no
access to kernel logging
✓ ProtectHome= Service has no
access to home directories
✓ PrivateDevices= Service has no
access to hardware devices
✓ RootDirectory=/RootImage= Service has its
own root directory/image
✗ PrivateNetwork= Service has
access to the host's network 0.5
✗ DeviceAllow= Service has a
device ACL with some special devices: char-rtc:r 0.1
✓ KeyringMode= Service doesn't
share key material with other services
✓ Delegate= Service does not
maintain its own delegated control group subtree
✓ PrivateUsers= Service does not
have access to other users
✗ IPAddressDeny= Service does not
define an IP address allow list 0.2
✓ NotifyAccess= Service child
processes cannot alter service state
✓ ProtectClock= Service cannot
write to the hardware clock or system clock
✓ CapabilityBoundingSet=~CAP_SYS_PACCT Service cannot
use acct()
✓ CapabilityBoundingSet=~CAP_KILL Service cannot
send UNIX signals to arbitrary processes
✓ ProtectKernelLogs= Service cannot
read from or write to the kernel log ring buffer
✓ CapabilityBoundingSet=~CAP_WAKE_ALARM Service cannot
program timers that wake up the system
✓ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service cannot
override UNIX file/IPC permission checks
✓ ProtectControlGroups= Service cannot
modify the control group file system
✓ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE Service cannot
mark files immutable
✓ CapabilityBoundingSet=~CAP_IPC_LOCK Service cannot
lock memory into RAM
✓ ProtectKernelModules= Service cannot
load or read kernel modules
✓ CapabilityBoundingSet=~CAP_SYS_MODULE Service cannot
load kernel modules
✓ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG Service cannot
issue vhangup()
✓ CapabilityBoundingSet=~CAP_SYS_BOOT Service cannot
issue reboot()
✓ CapabilityBoundingSet=~CAP_SYS_CHROOT Service cannot
issue chroot()
✓ PrivateMounts= Service cannot
install system mounts
✓ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND Service cannot
establish wake locks
✓ MemoryDenyWriteExecute= Service cannot
create writable executable memory mappings
✓ RestrictNamespaces=~user Service cannot
create user namespaces
✓ RestrictNamespaces=~pid Service cannot
create process namespaces
✓ RestrictNamespaces=~net Service cannot
create network namespaces
✓ RestrictNamespaces=~uts Service cannot
create hostname namespaces
✓ RestrictNamespaces=~mnt Service cannot
create file system namespaces
✓ CapabilityBoundingSet=~CAP_LEASE Service cannot
create file leases
✓ CapabilityBoundingSet=~CAP_MKNOD Service cannot
create device nodes
✓ RestrictNamespaces=~cgroup Service cannot
create cgroup namespaces
✓ RestrictNamespaces=~ipc Service cannot
create IPC namespaces
✓ ProtectHostname= Service cannot
change system host/domainname
✓ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service cannot
change file ownership/access mode/capabilities
✓ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service cannot
change UID/GID identities/capabilities
✓ LockPersonality= Service cannot
change ABI personality
✓ ProtectKernelTunables= Service cannot
alter kernel tunables (/proc/sys, …)
✓ RestrictAddressFamilies=~AF_PACKET Service cannot
allocate packet sockets
✓ RestrictAddressFamilies=~… Service cannot
allocate exotic sockets
✓ CapabilityBoundingSet=~CAP_MAC_* Service cannot
adjust SMACK MAC
✓ RestrictSUIDSGID= SUID/SGID file
creation by service is restricted
✓ UMask= Files created by
service are accessible only by service's own user by default
→ Overall exposure level for unbound.service: 1.1 OK 🙂