On Wed, 13 Sep 2023 20:16:09 -0400 jleivent <jleiv...@comcast.net> wrote:
> Forgive the long post. Tl;dr: what are the rules of object ID lifetime > and reuse in the Wayland protocol? Hi, congratulations, I think you may have found everything that is not quite right in the fundamental Wayland protocol design. :-) As an aside, we collect unfixable issues under https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next These are issues that are either impossible or very difficult or annoying to fix while keeping backward compatibility with both servers and clients. ------ Object ID re-use is what I would call "aggressive": in the libwayland C implementation, the object ID last freed is the first one to be allocated next. There are two separate allocation ranges each with its own free list: server and client allocated IDs. The C implementation also poses an additional restriction: a new ID can be at most the largest ever allocated ID + 1. All this is to keep the ID map as compact as possible without a hash table. These details are in the implementation of the private 'struct wl_map' in libwayland. > I am attempting to understand the rules of object ID lifetime within > the Wayland protocol in order to construct Wayland middleware (similar > to some of the tools featured on > https://wayland.freedesktop.org/extras.html). I could not find a > comprehensive discussion of the details online. If one exists, I would > greatly appreciate a link! > > Middleware tools that wish to decode Wayland messages sent between the > compositor and its clients need to maintain an accurate mapping between > object ID and object interface (type). This is needed because the wire > protocol's message header includes only the target object ID and an > opcode that is relative to the object's type (the message header also > includes the message length - about which I also have questions - to be > pursued later...). The message (request or event) and its argument > encoding can only be determined if the object ID -> type and type + > opcode -> message mappings are accurately maintained. The type + > opcode -> message mapping is static and can be extracted offline from > the protocol XML files. > > Since object IDs can be reused, it is important for the middleware to > understand when an ID can be reused and when it cannot be to avoid > errors in the ID -> type mapping. > > Because the Wayland protocol is asynchronous, any message that implies > destruction of an object should be acknowledged by the receiver before > the destructed object's ID is reused. > > Fortunately, certain events and requests have been tagged as > destructors in the protocol descriptions! > > Also fortunately, it appears (based on reading the wl_resource_destroy > code in wayland-server.c) that for many object IDs, specifically for > IDs of objects created by a client request (the ID appears as a new ID > arg of a request, and is thus in the client side of the ID range) and > for which the client makes a destructor request, the compositor will > always send a wl_display::delete_id event (assuming the > display_resource still exists for the client, which apparently would > only not be the case after the client connection is severed) to > acknowledge the destructor request. Any attempt to reuse that ID prior > to the wl_display::delete_id event can lead to confusion, and should be > avoided. Reuse of the ID after the wl_display::delete_id event should > not result in any confusion. > > [BTW: for the purpose of this discussion, an object is "created" when > it is introduced into a protocol message for the first time via a new_id > argument. It does not refer to the actual allocation of the object in > memory or to its initialization.] Your whole above analysis is completely correct! > However, the other cases are not as easy to identify. > > The other cases are: > 1. an object created by a client request that has destructor events > 2. an object created by the compositor > > It might be true that case 1 does not exist. Is there a general rule > against that such cases would never be considered in future expansions > of the Wayland protocol? Destructor events do exist. Tagging them as such in the XML was not done from the beginning though, it was added later in a backward-compatible manner which makes the tag more informational than something libwayland could automatically process. The foremost example is wl_callback.done event. This is only safe because it is guaranteed that the client cannot be sending a request on wl_callback at the same time the server is sending 'done' and destroying the object: wl_callback has no requests defined at all. It also requires that nothing passes an existing wl_callback object as an argument in any request. We have been merely lucky that no-one has done that. It's really hard to imagine a use case where you would want to pass an existing wl_callback to anything. Extensions may have similar objects that only deliver some one-off events and then "self-destruct" by the final event. All this is simply documented and not marked in the XML. The asymmetry in the fundamental protocol bookkeeping messages (wl_display.delete_id) is unfortunate in hindsight. Client created and client destroyed objects are best supported, anything else will require very careful messaging design to avoid a race pitfall. We have a proper tear-down sequence for a client destroying an object, but we do not have a similar sequence for the server destroying an object, where the client would ack the destruction before the object ID is re-used. > For objects created by the compositor, there are 2 subcases: > > 2a. objects with only destructor events > 2b. objects with destructor requests > > Again, it might be the case that 2b does not exist, as it is analogous > to case 1 above. But, is there a general rule against such > future cases as well? Combining 1 and 2b, is there a general rule that > says that only the object creator can initiate an object's destruction > (unprovoked by the other side of the protocol)? There is no such rule, but nowadays the recommendation is to keep to client created and client destroyed objects unless there are pressing reasons to design something more delicate/fragile. 2b exists as wl_data_offer, created by wl_data_device.data_offer event. I'm not aware of 2a existing in the wild, but they could I guess. Generally we try to avoid new_id in events, because it causes a race by default: what if the server is sending a new_id event at the same time the client destroys the originating object? The result is a mismatch in object ID tracking between server and client, causing unexpected failures perhaps much later. When a client (or server) calls the respective API to destroy a protocol object, libwayland guarantees that the object's listeners/callbacks won't be called even if messages are still received on that object until it is properly torn down. That makes the user code easier to write, but it also throws the afterwards received messages away. If that message was to create a new protocol object, that won't happen, the ID won't be allocated. This causes the mismatch. Libwayland cannot automatically destroy objects either, because it has no information on how to do that properly and safely. The XML does not tell for sure. > > For object IDs created by the compositor and with only destructor > events (case 2a), it may be necessary to understand the details of each > interface in question to decide when the ID can be reused, as there is > no universal destructor acknowledgement request comparable to the > wl_display::delete_id event. A requirement to understand the details > to that level would make middleware development more difficult. Insert > extreme sadness emoji here. Indeed. > Thankfully, it seems that destructor events are themselves > acknowledgements of requests for destruction by the client (such as > wp_drm_lease_device_v1::released event destructor vs. > wp_drm_lease_device_v1::release request), or involve objects with a > very limited lifetime and usage, such as callbacks > (wp_presentation_feedback, zwp_linux_buffer_release, and > zwp_fullscreen_shell_mode_feedback_v1). These limited lifetime/usage > objects are created with the knowledge that all messages for them are > destructor events, and that they are not involved in any other messages > (as targets or arguments). Hence their destruction needs no further > acknowledgement because the request for destruction was implied by > their creation. The destructor event is the acknowledgement of that > request. > > Is this a general rule: that a destructor event is is always the > acknowledgement of a (perhaps implied) destruction request? Not specifically, but it seems to be the result of careful protocol interface design, to avoid races. Not all interfaces avoid those races. I think destroying a wl_data_device is theoretically racy, but we get lucky every time. Originally, wl_data_device did not even have a destructor request at all - it was undestroyable until disconnection. There might still be some interfaces like that left. > So there may be two general simple rules that the middleware can follow > to maintain a proper ID -> type mapping through ID reuse cycles: > > 1. reuse of ID is allowed after wl_display::delete_id(ID) > "wl_display::delete_id events are the acknowledgements for all > destruction requests" Correct. > 2. reuse of ID is allowed after a destructor event targeted to ID > "destruction events do not need acknowledgements" > > But this only works if the Wayland protocol has these general > requirements: > > A. Ownership vs. destruction: An object created by a client can only > have destructor requests, and an object created by the compositor can > only have destructor events. False, violated in the wild. Examples: wl_callback created by client, destroyed by server; wl_data_offer created by server, destroyed by client. > B. All destructor requests are followed by wl_display::delete_id events > as acknowledgements True, as this is built in to libwayland. > C. All destructor events are themselves acknowledgements of (implicit) > destruction requests. Close enough I suppose, assuming the events are correctly tagged as destructors. As mentioned, the tag is heavily recommended but not technically mandatory for the libwayland C bindings, because it was introduced several years too late. > D. No object can be the target or argument of a message issued by one > side after that side has issued a destructor request (explicit or > implicit). Correct, I believe. An actual destructor request should be enforced like that in libwayland. Allowing otherwise would defeat the point of a destructor request or a non-destructor request intended to trigger a destructor event. Nothing enforces the latter though. > Is this correct? Are these actual requirements that are enforced > for the current protocol and future expansions? > > Can there be cases of objects created by the compositor, where the > compositor proposes their destruction without any prior (implicit or > explicit) request to do so from the client? If so, how are these I don't know of anything forbidding them. > acknowledged? Not allowing any such cases seems very strict. But, if > some universal acknowledgement analog to wl_display::delete_id were to > be introduced to allow these cases, and if the events to be interpreted > as proposing destruction were tagged in the protocol as "destructors", > that would break rule 2 above. Extending wl_display interface (or even wl_registry) is very tricky, the protocol has no initial version negotiation or even hand-shake. It would need to be something exposed through wl_registry, which poses further complications in libwayland-client since it does not implement wl_registry. > > BTW: some of the middleware tools on > https://wayland.freedesktop.org/extras.html do not follow these rules. > Thanks, pq
pgpy5_XxhJQKI.pgp
Description: OpenPGP digital signature