Document the spawn_template userspace ABI, fd lifetime, per-spawn actions, default fd-closing behavior, security model, invalidation, and cached ELF metadata. Keep workload-specific benchmark details out of the kernel documentation.
Add the spawn template files to the exec/binfmt MAINTAINERS entry so the documentation, UAPI, internal header, and implementation are covered in the same patch. Signed-off-by: Li Chen <[email protected]> --- Documentation/userspace-api/index.rst | 1 + .../userspace-api/spawn_template.rst | 141 ++++++++++++++++++ MAINTAINERS | 2 + 3 files changed, 144 insertions(+) create mode 100644 Documentation/userspace-api/spawn_template.rst diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index a68b1bea57a85..28520d16d3862 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -22,6 +22,7 @@ System calls ioctl/index mseal rseq + spawn_template Security-related interfaces =========================== diff --git a/Documentation/userspace-api/spawn_template.rst b/Documentation/userspace-api/spawn_template.rst new file mode 100644 index 0000000000000..0396d292fd17d --- /dev/null +++ b/Documentation/userspace-api/spawn_template.rst @@ -0,0 +1,141 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +Spawn templates +=============== + +``spawn_template`` is a userspace-controlled interface for workloads that +repeatedly start the same executable with different arguments, environment, and +file-descriptor setup. + +Userspace creates a template fd for an executable with +``spawn_template_create()``. Later calls to ``spawn_template_spawn()`` create a +new child from that template and return both a pid and a pidfd. The child still +executes through the normal ``execve`` path. The template only lets the kernel +reuse metadata that is safe to reuse after revalidation. + +This is intended for launchers, shells, and agent runtimes that already know +which tools are hot. The kernel does not decide policy for names such as +``rg``, ``git``, or ``sed``. Userspace should keep its existing spawn path as a +fallback for unsupported files, invalidated templates, and policy decisions. + +This RFC version supports ELF executable templates only. Scripts, binfmt_misc +targets, and other non-ELF formats are expected to use the fallback path. + +Template lifetime +================= + +``spawn_template_create()`` takes ``struct spawn_template_create_args`` and +returns a template fd. The fd is an ordinary file descriptor backed by an +anonymous inode. Closing the fd releases the template. + +Userspace can identify the executable either by an existing executable fd or by +path. Exactly one of ``execfd`` and ``filename`` must be supplied. Passing +``SPAWN_TEMPLATE_CREATE_CLOEXEC`` sets ``O_CLOEXEC`` on the returned template +fd. + +Creating a template for an unsupported executable format fails. For this RFC +that means non-ELF executables fail template creation rather than becoming a +partially cached template. + +Create-time fd actions are not supported. ``actions`` and ``actions_len`` in +``struct spawn_template_create_args`` are reserved and must be zero. File +descriptor numbers are per-process state, so reusable fd actions would be +ambiguous once the creating process changes its fd table. + +Spawning +======== + +``spawn_template_spawn()`` takes a template fd and +``struct spawn_template_spawn_args``. ``argv`` and ``envp`` point to the normal +userspace argument and environment vectors for the new image. ``pidfd`` points +to an ``int`` in userspace where the kernel stores the new pidfd. The syscall +return value is the new pid on success. + +A successful ``spawn_template_spawn()`` return means the child has been created +and the pidfd has been installed. After that point, per-spawn action failures +or exec failures are reported by the child exit status, not by changing the +syscall return value. The syscall itself returns a negative errno only for +errors detected before child creation, such as bad arguments, a bad template +fd, stale executable identity, or clone failure. + +Per-spawn actions run in the child before exec. They are intended for the same +kind of setup that ``posix_spawn_file_actions_t`` commonly performs: + +``SPAWN_TEMPLATE_ACTION_CLOSE`` + Close one fd. + +``SPAWN_TEMPLATE_ACTION_DUP2`` + Duplicate one fd to another fd, optionally with ``O_CLOEXEC``. + +``SPAWN_TEMPLATE_ACTION_FCHDIR`` + Change the child's current working directory to an open directory fd. + +``SPAWN_TEMPLATE_ACTION_OPEN`` + Open a path using ``struct open_how`` and install it at ``newfd``. + +``SPAWN_TEMPLATE_ACTION_CLOSE_RANGE`` + Apply ``close_range()`` to a child fd range. + +``SPAWN_TEMPLATE_ACTION_SIGMASK`` + Set the child signal mask. + +``SPAWN_TEMPLATE_ACTION_SIGDEFAULT`` + Reset selected signal dispositions to ``SIG_DFL``. + +By default, the child closes all inherited file descriptors above standard +error after the requested actions have run. Passing +``SPAWN_TEMPLATE_SPAWN_INHERIT_FDS`` keeps the traditional inheritance model. +Launchers for untrusted or secret-bearing workloads should prefer the default. + +Security model +============== + +``spawn_template_spawn()`` is not a shortcut around ``execve`` security. Each +spawn still reaches the normal binary handler and credential commit path, so +permission checks, LSM hooks, secure-exec handling, and ``no_new_privs`` remain +part of execution. + +The template fd does not grant ambient authority to unrelated tasks. The +current implementation requires the caller to have the same credential object +that created the template. Passing the fd with ``SCM_RIGHTS`` is therefore not +enough to delegate spawn authority after credentials have changed. + +The kernel pins the executable inode against writes while the template exists. +An in-place writer therefore fails while a template fd is alive. A package +manager can still replace a tool with a rename; a path-created template then +sees that the absolute path resolves to a different executable and spawn fails +before creating a child. Userspace can close the old template fd and create a +new one after such an update. + +Each spawn revalidates cached identity metadata before using template metadata. +The key includes device, inode, size, mode, owner, ctime, and mtime. +Path-created templates re-open the path before child creation and reject reuse +if the path now names a different executable. + +Cached metadata +=============== + +For ELF executables, the template caches only the main executable ELF header, +program headers, and executable identity key. The cached program headers are +used to avoid repeated metadata reads for hot executables after the executable +identity has been revalidated. + +The cache does not include the shared-library dependency graph. Shared +libraries are found by the userspace dynamic linker after exec and depend on +userspace policy such as ``LD_LIBRARY_PATH``, ``RPATH``, ``RUNPATH``, +``/etc/ld.so.cache``, mount namespaces, and secure-exec state. The kernel +therefore does not try to duplicate dynamic-linker policy in a spawn template. + +Errors and fallback +=================== + +If template creation reports an unsupported format, or if spawn reports a stale +template before child creation, the caller should use its existing spawn +implementation. A launcher may also drop the template fd and create a new +template after a failure. Once spawn has returned a pid, the caller should +observe child success or failure by waiting on the pid or pidfd. + +The interface is designed so ordinary tools do not need to be modified. +Runtimes that already centralize process launch can opt in one executable at a +time and preserve their existing fallback behavior. diff --git a/MAINTAINERS b/MAINTAINERS index ea4134a188779..3e737097940f9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9728,7 +9728,9 @@ M: Kees Cook <[email protected]> L: [email protected] S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve +F: arch/x86/entry/syscalls/syscall_64.tbl F: Documentation/userspace-api/ELF.rst +F: Documentation/userspace-api/spawn_template.rst F: fs/*binfmt_*.c F: fs/Kconfig.binfmt F: fs/exec.c -- 2.52.0

