Re: [systemd-devel] give unprivileged nspawn container write access to host wayland socket
Hey Nozz, I've tried the exact same setup and run into this problem. I've explained it a bit better here[1]. Since the linux kernel 5.12 there are filesystem id mappings that can be used for that in combination with --private-users=pick. I've written the pull request[0] to include support in nspawn for that. In my opinion this is the best way to share such a socket. There is not yet a systemd release containing the pull request. I'm not sure if the tempfs, where I guess your socket is located, implementation in linux does yet support those mappings, last time I checked (when I wrote the pull request) it didn't. Yes support for filesystem id mappings depends on the source filesystem. You could solve this by moving the socket to another location, for example an ext4 filesystem, until tmpfs supports it as well. Alternatively you could use extended acls for that. Another option would be to allow access for "other" on the socket, but not the parent folder, and use --bind as is. Best regards, nd [0] https://github.com/systemd/systemd/pull/19828 [1] https://lists.freedesktop.org/archives/systemd-devel/2021-May/046503.html OpenPGP_signature Description: OpenPGP digital signature
[systemd-devel] systemd-nspawn with filesystem id mapping
Hi!
I was very pleased to see the "nspawn: add support for kernel 5.12 ID mapping
mounts #19438"-pull request and went right at it to try it out.
The following was tested on the current git head of systemd running on
archlinux.
What I try to achieve on a high level is kind of emulating bubblewrap and
executing chromium under wayland with gpu acceleration and working audio using
PipeWire.
For that I need to pass some sockets and devices to the container using
--bind-ro . I want to use --private-users=pick to have easier separation
between multiple Containers.
That means I do not know the running uid of the process before nspawn spawns my
container. That results on problems accessing the sockets.
Until now I used setfacl to work around this limitation and allow access to the
sockets.
I was hoping to be able to skip that with --private-users-ownership=map .
I'm passing three sockets belonging to uid 1000 on the host to a container with
private-users=pick and and try to access it via uid 1000 (name "user") in the
container.
Everything is happening on an ext4 file system. I'd prefer btrfs but that is
(so far) lacking id mapping support.
The full call looks like that:
statepath="/machines/state/chromium/${profilename}"
systemd-nspawn \
-D /machines/images/archlinux-chromium/ \
--private-users=pick \
--private-users-ownership=map \
--no-new-privileges=yes \
--as-pid2 \
--machine "chromium-${profilename}" \
--user user \
--bind-ro /var/run/user/1000/pulse/native:/sockets/pulse/native \
--bind-ro /var/run/user/1000/wayland-1:/sockets/wayland-1 \
--bind-ro /var/run/user/1000/pipewire-0:/sockets/pipewire-0 \
--bind "${statepath}:/home/user" \
--bind /dev/dri/renderD128 \
-E WAYLAND_DISPLAY=wayland-1 \
-E XDG_RUNTIME_DIR=/sockets \
chromium --enable-features=UseOzonePlatform --ozone-platform=wayland
This results in the following output:
Spawning container chromium-default on /machines/images/archlinux-chromium.
Press ^] three times within 1s to kill container.
Selected user namespace base 552206336 and range 65536.
Failed to create mount point
/machines/images/archlinux-chromium/sockets/pipewire-0: Value too large for
defined data type
I've run strace on it, this results in the following relevant output:
[pid 524] mount("/machines/state/chromium/default", "/proc/self/fd/8", NULL,
MS_BIND|MS_REC, NULL) = 0
[pid 524] close(8)= 0
[pid 524] newfstatat(AT_FDCWD, "/var/run/user/1000/pipewire-0",
{st_mode=S_IFSOCK|0666, st_size=0, ...}, 0) = 0
[pid 524] openat(AT_FDCWD, "/machines/images/archlinux-chromium",
O_RDONLY|O_CLOEXEC|O_PATH|O_DIRECTORY) = 8
[pid 524] openat(8, "sockets", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = 10
[pid 524] newfstatat(10, "", {st_mode=S_IFDIR|0700, st_size=4096, ...},
AT_EMPTY_PATH) = 0
[pid 524] close(8)= 0
[pid 524] openat(10, "pipewire-0", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_PATH) = -1
ENOENT (No such file or directory
)
[pid 524] close(10) = 0
[pid 524] newfstatat(AT_FDCWD, "/machines/images/archlinux-chromium/sockets",
{st_mode=S_IFDIR|0700, st_size=40
96, ...}, 0) = 0
[pid 524] openat(AT_FDCWD,
"/machines/images/archlinux-chromium/sockets/pipewire-0",
O_RDONLY|O_NOFOLLOW|O_CLOE
XEC|O_PATH) = -1 ENOENT (No such file or directory)
[pid 524] openat(AT_FDCWD,
"/machines/images/archlinux-chromium/sockets/pipewire-0",
O_WRONLY|O_CREAT|O_EXCL|O_
CLOEXEC, 0644) = -1 EOVERFLOW (Value too large for defined data type)
[pid 524] writev(2, [{iov_base="Failed to create mount point /ma"...,
iov_len=122}, {iov_base="\n", iov_len=1}]
, 2Failed to create mount point
/machines/images/archlinux-chromium/sockets/pipewire-0: Value too large for
defin
ed data type
) = 123
This maps to the touch in nspawn-mount.c at line 754.
If I skip the --bind(-ro) part this works fine (except chromium of course not
working), same if I keep the binds and remove the --private-users-ownership=map.
I'm kind of lost on how to go on about this issue at this point.
Have I made a mistake or wrong assumption about how that should work?
Should I open an issue on github about that?
Thanks,
nd
___
systemd-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-nspawn with filesystem id mapping
Hi again, after some more debugging this EOVERFLOW seems to be the result of a call to may_o_create in fs/namei.c in the kernel. There is a check: if (!fsuidgid_has_mapping(dir->dentry->d_sb, mnt_userns)) return -EOVERFLOW; This seems to be the one returning EOVERFLOW to nspawn and resulting in the container spawn to fail. My guess would be that this is a systemd bug when combining filesystem id mapping with --bind. Before I start spending more time debugging this, has anyone so far used --bind with --private-users=pick and --private-users-ownership=map successfull? As far as I understand the pull request #19438 , didn't add any handling to the mount_bind function. Was this maybe overlooked? In my understanding there is a remount_idmap missing in that function well as the touch needs to be done in the correct user namespace or with mapped uid/gids. I'm new to the systemd source code, could somebody confirm that I'm on the right track there and not heading in the wrong direction? Thanks, nd OpenPGP_signature Description: OpenPGP digital signature ___________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] (no subject)
From [email protected] Sat Aug 22 17:58:23 2015 MIME-Version: 1.0 References: <[email protected]> In-Reply-To: <[email protected]> Message-ID: Date: Sat, 22 Aug 2015 14:58:24 + Subject: Re: [systemd-devel] user/session buses From: =?UTF-8?Q?Mantas_Mikul=C4=97nas?= To: =?UTF-8?Q?Micha=C5=82_Zegan?= , [email protected] Content-Type: multipart/alternative; boundary=001a114b9560bdfa07051de79d10 --001a114b9560bdfa07051de79d10 Content-Type: text/plain; charset=UTF-8; format=flowed; delsp=yes Content-Transfer-Encoding: base64 V2VsbCwgeW91IGp1c3Qgd291bGRuJ3QgaGF2ZSBtb3JlIHRoYW4gb25lIGdyYXBoaWNhbCBzZXNz aW9uLiBUaGF0J3MgcGFydCAgDQpvZiB0aGUgZ2VuZXJhbCBwbGFuIGFmYWlrLg0KDQpOb3RlIHRo YXQgdGhpcyBpcyBhbHJlYWR5IGhhbGYtYnJva2VuLCBiZWNhdXNlIHNvbWUgb2YgdGhvc2UgcHJv Z3JhbXMgIA0KYWN0dWFsbHkgKmV4cGVjdCogdG8gYmUgdW5pcXVlICpwZXIgdXNlciog4oCTIGVn IGRjb25mLWRhZW1vbiBmb3Igd3JpdGluZyB0byAgDQp0aGUgZGNvbmYgZGIg4oCTIGFuZCBoYXZp bmcgdHdvIGNvcGllcyBvZiBpdCBpbiB0d28gc2Vzc2lvbnMgbWlnaHQgYmUgYmFk4oCmDQoNCg0K DQpPbiBTYXQsIEF1ZyAyMiwgMjAxNSwgMTM6MzYgTWljaGHFgiBaZWdhbiA8d2ViY3phdF8yMDBA cG9jenRhLm9uZXQucGw+IHdyb3RlOg0KDQoNCkhlbGxvLg0KDQpJIGJlbGlldmUsIGFsdGhvdWdo IG1heSBiZSB3cm9uZywgdGhhdCBzZXNzaW9uIGJ1c2VzIHdlcmUgdXNlZCB0bw0KZW5mb3JjZSBz aW5nbGUgaW5zdGFuY2VzIG9mIHByb2dyYW1zLCBsaWtlIGEgcHJvZ3JhbSByZWdpc3RlcmVkIGEg bmFtZQ0Kb24gZGJ1cyBhbmQgYW5vdGhlciBpbnN0YW5jZSBvZiB0aGUgc2FtZSBwcm9ncmFtIGNv dWxkIG5vdCBydW4uDQpIb3cgd291bGQgaXQgYWZmZWN0IHVzZXIgYnVzZXMgaW4gY2FzZSBvZiBt dWx0aXBsZSBncmFwaGljYWwgdXNlciBzZXNzaW9ucz8NCl9fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fDQpzeXN0ZW1kLWRldmVsIG1haWxpbmcgbGlzdA0Kc3lz dGVtZC1kZXZlbEBsaXN0cy5mcmVlZGVza3RvcC5vcmcNCmh0dHA6Ly9saXN0cy5mcmVlZGVza3Rv cC5vcmcvbWFpbG1hbi9saXN0aW5mby9zeXN0ZW1kLWRldmVsDQoNCg0K --001a114b9560bdfa07051de79d10 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Well, you just wouldn't have more than one graphical ses= sion. That's part of the general plan afaik. Note that this is already half-broken, because some of those= programs actually *expect* to be unique *per user* =E2=80=93 e.g. dconf-da= emon for writing to the dconf db =E2=80=93 and having two copies of it in t= wo sessions might be bad=E2=80=A6 On Sat, Aug 22, 2015, 13:36= =C2=A0Micha=C5=82 Zegan <mailto:[email protected]";>w= [email protected]> wrote:Hello. I believe, although may be wrong, that session buses were used to enforce single instances of programs, like a program registered a name on dbus and another instance of the same program could not run. How would it affect user buses in case of multiple graphical user sessions?= _______ systemd-devel mailing list mailto:[email protected]"; target=3D"_blank">sy= [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel"; rel= =3D"noreferrer" target=3D"_blank">http://lists.freedesktop.org/mailman/list= info/systemd-devel --001a114b9560bdfa07051de79d10-- ___ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Where does resolved takes its data from?
systemd-resolved has a DBUS API, which is used by network configuration managers such as systemd-networkd and NetworkManager to set the hostname resolution -related configuration to be used by systemd-resolved. You can see the runtime configuration of systemd-resolved by running `systemd-resolve --status`. To see what protocols (DNS, LLMNR, MDNS) are used to resolve a specific hostname, use `systemd-resolve somemachine.local`, for example. The protocols that are used during hostname resolution can be toggled per-interface using the same command, or they can be set via the DBUS API by some network configuration manager. Caution: The following is "as far as I know": Please note that the systemd-resolved DBUS API provides methods to do hostname resolution with more control over the resolution method than the functions provided by GNU C libraries. These latter functions inspect `hosts:` entry of `/etc/nsswitch.conf` to determine plugins that are used to do hostname resolution, one of which should be `resolve` to direct queries to systemd-resolved in case the GNU C hostname resolution API is used. Sorry if this veered into the territory of "I didn't ask this question". I just thought that clarifying the whole picture could help in better setting up hostname resolution. ___________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd.link MACAddress= matches OpenVPN tun device
Name : systemd (commit c38499d476026d999558a7eee9c95ca2fa41e115) Version : 239.2-1 I have a systemd.link file that gives my usb modem a more recognizable name. I saw some renaming errors in the journal and noticed that systemd also tried to rename my VPN device. This shouldn't happen and I investigated. Here's the result: It appears that the `50-usbmodem.link` file is being applied to the `tunvpn` device, even though the file has a MACAddress filter to match only the usbmodem. I have the following file: /etc/systemd/network/50-usbmodem.link [Match] MACAddress=aa:bb:cc:dd:ee:ff [Link] Name=usbmodem And by running the following command, it can be seen that the problem really occurs. $ udevadm test-builtin net_setup_link /sys/class/net/tunvpn/ calling: test-builtin Load module index Parsed configuration file /etc/systemd/network/50-usbmodem.link Created link configuration context. ID_NET_DRIVER=tun Config file /etc/systemd/network/50-usbmodem.link applies to device tunvpn link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. ID_NET_LINK_FILE=/etc/systemd/network/50-usbmodem.link ID_NET_NAME=usbmodem Unload module index Unloaded link configuration context. The tun device has no ethernet address, as it's a L3 interface, so the MACAddress really really shouldn't match. $ ip link show tunvpn xx: tunvpn: mtu 1500 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 100 link/none I fixed this temporarily by adding the following line to the [Match] section: Driver=huawei_cdc_ncm I'm not entirely sure, but this appears to be a bug. Maybe relevant section: https://github.com/systemd/systemd/blob/c38499d476026d999558a7eee9c95ca2fa41e115/src/udev/net/link-config.c#L218 ___________ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Missing PropertiesChanged signal for service start
Hi, I have a service file as follows: [Unit] Description= "Daemon description" After=a.service <https://opengrok-evo.juniper.net/source/s?path=mgd.service&project=EVO_TOT> b.service <https://opengrok-evo.juniper.net/source/s?path=mgd-api.service&project=EVO_TOT> c.service <https://opengrok-evo.juniper.net/source/s?path=jsd.service&project=EVO_TOT> OnFailure=failure_handler@%p.service <https://opengrok-evo.juniper.net/source/s?path=p.service&project=EVO_TOT> [Service] WorkingDirectory=/usr/sbin <https://opengrok-evo.juniper.net/source/s?path=/usr/sbin&project=EVO_TOT> ExecStartPre=/bin/sleep <https://opengrok-evo.juniper.net/source/s?path=/bin/sleep&project=EVO_TOT> 30 ExecStart= When this service starts I expected a signal indicating state=active. When I reboot the system multiple times, the signal indicating "active" is missing some times. I got the signal ActiveState=activating, SubState=start-pre at all times. But signal indicating ActiveState="active" and SubState="running" was missing for some reboots. The service is running and shows active state all the time. What is reason for missing signal? I am also checking if the sleep in the ExecStartPre is required for this service. I am wondering if that has something to do with the missing signal. Thanks Ashitha ___ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Mount a remote FS as a user
Hello, OK, thanks for the clarification. I was afraid that the situation is like you have described. Still it surprises me that even the sshfs case cannot be handled by user instance of systemd ... Do you have any information that the kernel is going to open autofs for unpriv clients? Or, could it be a way to write a d-bus capable daemon (or use/extend udisks or systemd capabilities?) which would handle the mounts for a particular user, i.e. a user would provide remote host+fs type+username+passwd+required mount point+access permissions and the daemon would mount it then for the user as required. Or has this way a security flow I don't see? Thanks, DT On Mon, Feb 11, 2019 at 6:27 PM Lennart Poettering wrote: > > On Mo, 11.02.19 15:59, Daniel Tihelka ([email protected]) wrote: > > > Hello, > > I can mount a shared file system (sshfs in particular) as an ordinary user. > > > > Now I would like to have it handled by systemd on-demand (automount). > > However, creating the automount unit and starting it fails with error: > > autofs (the kernel subsystem behin the .automount unit type) is > accessible to privileged clients only, and systemd --user is not > privileged in general. This means what you are trying to do is simply > not supported by the kernel. > > We could start supporting this if the kernel would open up autofs for > unpriv clients, like it did for fuse mounts. However, I don't see that > happening any time soon. > > Sorry! > > Lennart > > -- > Lennart Poettering, Red Hat ___ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Missing PropertiesChanged signal for service start
Hi Lennart, I missed some details in the previous mail. This is seen on systemd 230. Unfortunately, I cannot do a systemd upgrade now. Subscribe() is done on org.freedesktop.systemd1 path =/org/freedesktop/systemd1 intf =org.freedesktop.systemd1.Manager. To make sure that the signal was not missed due to an error in the Subscribe() logic, I also ran a dbus-monitor script that runs "/usr/bin/dbus-monitor --system" and redirects it to a dbus-monitor log file. The dbus-monitor script is always guaranteed to run before the service in question so it doesn't miss any signal. I don't see any signal indicating active state in the dbus-monitor log file when the issue happens. Thanks Ashitha On Tue, Feb 12, 2019 at 2:14 AM Lennart Poettering wrote: > On Mo, 11.02.19 19:50, systemd Mailing List ( > [email protected]) wrote: > > > Hi, > > > > I have a service file as follows: > > > > [Unit] > > Description= "Daemon description" > > After=a.service > > < > https://opengrok-evo.juniper.net/source/s?path=mgd.service&project=EVO_TOT > > > > b.service < > https://opengrok-evo.juniper.net/source/s?path=mgd-api.service&project=EVO_TOT > > > > c.service < > https://opengrok-evo.juniper.net/source/s?path=jsd.service&project=EVO_TOT > > > > OnFailure=failure_handler@%p.service > > < > https://opengrok-evo.juniper.net/source/s?path=p.service&project=EVO_TOT> > > > > [Service] > > WorkingDirectory=/usr/sbin > > < > https://opengrok-evo.juniper.net/source/s?path=/usr/sbin&project=EVO_TOT> > > ExecStartPre=/bin/sleep > > < > https://opengrok-evo.juniper.net/source/s?path=/bin/sleep&project=EVO_TOT> > > 30 > > ExecStart= > > > > > > When this service starts I expected a signal indicating state=active. > > When I reboot the system multiple times, the signal indicating > > "active" is missing some times. > > > > I got the signal ActiveState=activating, SubState=start-pre at all > > times. But signal indicating ActiveState="active" and > > SubState="running" was missing for some reboots. > > > > The service is running and shows active state all the time. What is > > reason for missing signal? I am also checking if the sleep in the > > ExecStartPre is required for this > > > > service. I am wondering if that has something to do with the missing > signal. > > Have you called Subscribe() on the manager object? Unless there's at > least one client doing that (which hasn't dsiconnected yet) these > messages are not necessarily generated. > > Also, which systemd version is this? There have been some bugfixes in > this area in the past, hence make sure to run a current version of systemd. > > Lennart > > -- > Lennart Poettering, Red Hat > -- thanks Ashitha ___ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delegate= on slice before v237
Hey Lennart, Thanks for the clarification. On Tue, Feb 12, 2019 at 2:17 AM Lennart Poettering wrote: > On Mo, 11.02.19 16:39, Filipe Brandenburger ([email protected]) wrote: > > Before systemd v237 (when Delegate= was no longer allowed on slice > > units)... Did setting Delegate=yes on a slice have *any* effect at all? > > > > Or did it just do nothing (and a slice with Delegate=no or no setting > > behave just the same)? > > > > Reason I ask is: I want to scrap this code > > < > https://github.com/opencontainers/runc/blob/v1.0.0-rc6/libcontainer/cgroups/systemd/apply_systemd.go#L195 > > > > in libcontainer that tries to detect whether Delegate= is accepted in a > > slice unit. (I'll just default it to false, never try it.) > > > > I'd like to be able to say that Delegate=yes never really did anything at > > all on slice units... So I'm trying to confirm that is really the case > > before stating it. > > So, it wasn't supposed to do anything, and what it does differs on > cgroupsv2 and cgroupsv1. libcontainer is pretty much cgroupv1 only, so that's what I'm concerned about. > The fact it wasn't refused outright was an > accident, and because it was one I am not entirely sure what the > precise effect of allowing it was. However, I am pretty sure it at > least had two effects: > > 1. it would turn on all controllers for the cgroup > I don't *think* this is why libcontainer was trying to enable it, since a few lines down it's explicitly enabling all the controllers by setting MemoryAccounting, CPUAccounting and BlockIOAccounting during transient unit creation: https://github.com/opencontainers/runc/blob/v1.0.0-rc6/libcontainer/cgroups/systemd/apply_systemd.go#L275 > 2. it would stop systemd to ever migrating foreign processes below >that slice, which is primarily relevant only when changing cgroup >related props on the slice dynamically I guess. > I'm not sure I follow... Do you mean if libcontainer would write to memory.limit_in_bytes (or one of the other properties of the memory or other controller managed by systemd, such as cpu), then systemd would not end up overwriting this as it does some other operation on the cgroup? I'm not completely sure I understand what "migrate foreign processes" means, given slices don't really hold any pids directly... Do you mean to scope and service units below that slice? In any case, for now I'll probably leave that alone... Though as I revamp libcontainer support for unified hierarchy, I'll try to skip that check on that case, that might make this a legacy-only setting, so not that important to fully get rid of it for a while... Cheers! Filipe ___ systemd-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/systemd-devel
