Re: Questions about object ID lifetimes

Pekka Paalanen Thu, 14 Sep 2023 06:32:33 -0700

On Wed, 13 Sep 2023 20:16:09 -0400
jleivent <jleiv...@comcast.net> wrote:


> Forgive the long post.  Tl;dr: what are the rules of object ID lifetime
> and reuse in the Wayland protocol?

Hi,

congratulations, I think you may have found everything that is not
quite right in the fundamental Wayland protocol design. :-)

As an aside, we collect unfixable issues under
https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next
These are issues that are either impossible or very difficult or
annoying to fix while keeping backward compatibility with both servers
and clients.

------

Object ID re-use is what I would call "aggressive": in the libwayland C
implementation, the object ID last freed is the first one to be
allocated next. There are two separate allocation ranges each with its
own free list: server and client allocated IDs.

The C implementation also poses an additional restriction: a new ID can
be at most the largest ever allocated ID + 1.

All this is to keep the ID map as compact as possible without a hash
table. These details are in the implementation of the private 'struct
wl_map' in libwayland.

> I am attempting to understand the rules of object ID lifetime within
> the Wayland protocol in order to construct Wayland middleware (similar
> to some of the tools featured on
> https://wayland.freedesktop.org/extras.html).  I could not find a
> comprehensive discussion of the details online.  If one exists, I would
> greatly appreciate a link!
> 
> Middleware tools that wish to decode Wayland messages sent between the
> compositor and its clients need to maintain an accurate mapping between
> object ID and object interface (type).  This is needed because the wire
> protocol's message header includes only the target object ID and an
> opcode that is relative to the object's type (the message header also
> includes the message length - about which I also have questions - to be
> pursued later...).  The message (request or event) and its argument
> encoding can only be determined if the object ID -> type and type +
> opcode -> message mappings are accurately maintained.  The type +
> opcode -> message mapping is static and can be extracted offline from
> the protocol XML files.
> 
> Since object IDs can be reused, it is important for the middleware to
> understand when an ID can be reused and when it cannot be to avoid
> errors in the ID -> type mapping.
> 
> Because the Wayland protocol is asynchronous, any message that implies
> destruction of an object should be acknowledged by the receiver before
> the destructed object's ID is reused.
> 
> Fortunately, certain events and requests have been tagged as
> destructors in the protocol descriptions!
> 
> Also fortunately, it appears (based on reading the wl_resource_destroy
> code in wayland-server.c) that for many object IDs, specifically for
> IDs of objects created by a client request (the ID appears as a new ID
> arg of a request, and is thus in the client side of the ID range) and
> for which the client makes a destructor request, the compositor will
> always send a wl_display::delete_id event (assuming the
> display_resource still exists for the client, which apparently would
> only not be the case after the client connection is severed) to
> acknowledge the destructor request. Any attempt to reuse that ID prior
> to the wl_display::delete_id event can lead to confusion, and should be
> avoided.  Reuse of the ID after the wl_display::delete_id event should
> not result in any confusion.
> 
> [BTW: for the purpose of this discussion, an object is "created" when
> it is introduced into a protocol message for the first time via a new_id
> argument.  It does not refer to the actual allocation of the object in
> memory or to its initialization.]

Your whole above analysis is completely correct!

> However, the other cases are not as easy to identify.
> 
> The other cases are:
> 1. an object created by a client request that has destructor events
> 2. an object created by the compositor
> 
> It might be true that case 1 does not exist.  Is there a general rule
> against that such cases would never be considered in future expansions
> of the Wayland protocol?

Destructor events do exist. Tagging them as such in the XML was not
done from the beginning though, it was added later in a
backward-compatible manner which makes the tag more informational than
something libwayland could automatically process. The foremost example
is wl_callback.done event. This is only safe because it is guaranteed
that the client cannot be sending a request on wl_callback at the same
time the server is sending 'done' and destroying the object:
wl_callback has no requests defined at all.

It also requires that nothing passes an existing wl_callback object as
an argument in any request. We have been merely lucky that no-one has
done that. It's really hard to imagine a use case where you would want
to pass an existing wl_callback to anything.

Extensions may have similar objects that only deliver some one-off
events and then "self-destruct" by the final event. All this is simply
documented and not marked in the XML.

The asymmetry in the fundamental protocol bookkeeping messages
(wl_display.delete_id) is unfortunate in hindsight. Client created and
client destroyed objects are best supported, anything else will require
very careful messaging design to avoid a race pitfall. We have a proper
tear-down sequence for a client destroying an object, but we do not
have a similar sequence for the server destroying an object, where the
client would ack the destruction before the object ID is re-used.

> For objects created by the compositor, there are 2 subcases:
> 
> 2a. objects with only destructor events
> 2b. objects with destructor requests
> 
> Again, it might be the case that 2b does not exist, as it is analogous
> to case 1 above.  But, is there a general rule against such
> future cases as well?  Combining 1 and 2b, is there a general rule that
> says that only the object creator can initiate an object's destruction
> (unprovoked by the other side of the protocol)?

There is no such rule, but nowadays the recommendation is to keep to
client created and client destroyed objects unless there are pressing
reasons to design something more delicate/fragile.

2b exists as wl_data_offer, created by wl_data_device.data_offer event.

I'm not aware of 2a existing in the wild, but they could I guess.

Generally we try to avoid new_id in events, because it causes a race by
default: what if the server is sending a new_id event at the same time
the client destroys the originating object? The result is a mismatch in
object ID tracking between server and client, causing unexpected
failures perhaps much later.

When a client (or server) calls the respective API to destroy a
protocol object, libwayland guarantees that the object's
listeners/callbacks won't be called even if messages are still received
on that object until it is properly torn down. That makes the user code
easier to write, but it also throws the afterwards received messages
away. If that message was to create a new protocol object, that won't
happen, the ID won't be allocated. This causes the mismatch.

Libwayland cannot automatically destroy objects either, because it has
no information on how to do that properly and safely. The XML does not
tell for sure.

> 
> For object IDs created by the compositor and with only destructor
> events (case 2a), it may be necessary to understand the details of each
> interface in question to decide when the ID can be reused, as there is
> no universal destructor acknowledgement request comparable to the
> wl_display::delete_id event.  A requirement to understand the details
> to that level would make middleware development more difficult.  Insert
> extreme sadness emoji here.

Indeed.

> Thankfully, it seems that destructor events are themselves
> acknowledgements of requests for destruction by the client (such as
> wp_drm_lease_device_v1::released event destructor vs.
> wp_drm_lease_device_v1::release request), or involve objects with a
> very limited lifetime and usage, such as callbacks
> (wp_presentation_feedback, zwp_linux_buffer_release, and
> zwp_fullscreen_shell_mode_feedback_v1).  These limited lifetime/usage
> objects are created with the knowledge that all messages for them are
> destructor events, and that they are not involved in any other messages
> (as targets or arguments).  Hence their destruction needs no further
> acknowledgement because the request for destruction was implied by
> their creation.  The destructor event is the acknowledgement of that
> request.
> 
> Is this a general rule: that a destructor event is is always the
> acknowledgement of a (perhaps implied) destruction request?

Not specifically, but it seems to be the result of careful protocol
interface design, to avoid races.

Not all interfaces avoid those races. I think destroying a
wl_data_device is theoretically racy, but we get lucky every time.

Originally, wl_data_device did not even have a destructor request at
all - it was undestroyable until disconnection. There might still be
some interfaces like that left.

> So there may be two general simple rules that the middleware can follow
> to maintain a proper ID -> type mapping through ID reuse cycles:
> 
> 1. reuse of ID is allowed after wl_display::delete_id(ID)
>    "wl_display::delete_id events are the acknowledgements for all
>    destruction requests"

Correct.

> 2. reuse of ID is allowed after a destructor event targeted to ID
>    "destruction events do not need acknowledgements"
> 
> But this only works if the Wayland protocol has these general
> requirements:
> 
> A. Ownership vs. destruction: An object created by a client can only
> have destructor requests, and an object created by the compositor can
> only have destructor events.

False, violated in the wild. Examples: wl_callback created by client,
destroyed by server; wl_data_offer created by server, destroyed by
client.

> B. All destructor requests are followed by wl_display::delete_id events
> as acknowledgements

True, as this is built in to libwayland.

> C. All destructor events are themselves acknowledgements of (implicit)
> destruction requests.

Close enough I suppose, assuming the events are correctly tagged as
destructors. As mentioned, the tag is heavily recommended but not
technically mandatory for the libwayland C bindings, because it was
introduced several years too late.

> D. No object can be the target or argument of a message issued by one
> side after that side has issued a destructor request (explicit or
> implicit).

Correct, I believe. An actual destructor request should be enforced
like that in libwayland. Allowing otherwise would defeat the point of a
destructor request or a non-destructor request intended to trigger a
destructor event. Nothing enforces the latter though.

> Is this correct?  Are these actual requirements that are enforced
> for the current protocol and future expansions?
> 
> Can there be cases of objects created by the compositor, where the
> compositor proposes their destruction without any prior (implicit or
> explicit) request to do so from the client?  If so, how are these

I don't know of anything forbidding them.

> acknowledged?  Not allowing any such cases seems very strict.  But, if
> some universal acknowledgement analog to wl_display::delete_id were to
> be introduced to allow these cases, and if the events to be interpreted
> as proposing destruction were tagged in the protocol as "destructors",
> that would break rule 2 above.

Extending wl_display interface (or even wl_registry) is very tricky,
the protocol has no initial version negotiation or even hand-shake. It
would need to be something exposed through wl_registry, which poses
further complications in libwayland-client since it does not implement
wl_registry.

> 
> BTW: some of the middleware tools on
> https://wayland.freedesktop.org/extras.html do not follow these rules.
> 

Thanks,
pq

pgpy5_XxhJQKI.pgp
Description: OpenPGP digital signature

Re: Questions about object ID lifetimes

Reply via email to