AFAIK, the handling of server crashes is still mostly an unresolved area. While in Arcan it's explicitly handled via SHMIF, there are only incompatible attempts to resolve this in Enlightenment and kwin_wayland, which allow the clients to wait for a while while a server is relaunched (and it needs to be the same server which knows some credentials from the previous run).

I got an idea when reading this: https://www.linux.org.ru/forum/talks/16983280?cid=16984372 While the exact approach described there (moving the clients between nested and parent compositors for the need of grouping) may sound niche, there may be other uses for it. One of them is simplifying the testing of Wayland servers, so a nested server is launched and a client is moved to it temporarily, instead of re-launching test clients every time.

Another, more practical, use is a backup server which the clients connect to in case of an emergency (the main server has crashed). Such a server does not even need to be presented to a user, it may run in background like screen/tmux, and it should be tiny, simple and robust. Similar examples:

• backup BIOSes in some motherboards, which can be used for booting if the main BIOS was corrupted due to an incorrect upgrade;

• a backup window manager in Windows 7, which is used when DWM.exe crashes (due to buggy GPU drivers, hardware problems in GPU, etc.);

• compiz-reloaded has a crash handler and launches some other process (xterm by default, or any other fallback WM) instead of itself, so the session is not rendered unusable (and some display managers quit the whole X.Org session if the WM has crashed with no replacement). In the X11 world, the X server takes the role of such a backup server, but there is no such separate entity in the Wayland world yet.

Handing over should be handled by the clients themselves, with security concerns (so a malicious server cannot steal the control over them). It seems to me though that the approaches for handing clients between multiple user-controlled servers, and between a main and a backup server, is fundamentally different.

For multiple user-controlled servers (like parent and nested, or vice versa):

1a) a client requests the current server to connect to a new server (specified), or

1b) the current server informs the client about a new server (specified) to connect;

2) the current server grants the reconnect;

3) the client freezes its event queue, disconnects from the current server and connects to the new one.

For a main server and a backup server:

1) the current servers informs the clients about the backup server beforehand;

2) a client determines that the main server has crashed;

3) the client freezes its event queue and connects to the backup server.

Though, if the main server is always launched by the backup one, it can be simplified to a common approach: the current (or backup) server informs the client about a new (or main) server, and the client connects to it whenever needed. All the client needs to know is that the server A trusts the server B. The source of trust is another discussion point (should it be merely an executable path, or something more complicated?)

What's your opinion on it? I'm not a Wayland expert, and I suppose there may be significant problems (like dealing with resource pointers). Do you have better ideas maybe?

Reply via email to