Re: Questions about object ID lifetimes

2023-09-15 Thread Pekka Paalanen
On Thu, 14 Sep 2023 15:10:48 -0400
jleivent  wrote:

> On Thu, 14 Sep 2023 16:32:06 +0300
> Pekka Paalanen  wrote:
> 
> > 
> > As an aside, we collect unfixable issues under
> > https://gitlab.freedesktop.org/wayland/wayland/-/issues/?label_name%5B%5D=Protocol-next
> > These are issues that are either impossible or very difficult or
> > annoying to fix while keeping backward compatibility with both servers
> > and clients.  
> 
> Only 7 of them?

Some of them are bags of issues.

Feel free to collect your ideas in new issues though. You may find
interested people.

> > --
> > 
> > Object ID re-use is what I would call "aggressive": in the libwayland
> > C implementation, the object ID last freed is the first one to be
> > allocated next. There are two separate allocation ranges each with its
> > own free list: server and client allocated IDs.  
> 
> After I sent the initial post, I realized that the two separate
> ID ranges help in the following way:
> 
> For any object ID in the allocation range of side A, a destructor
> message from side B does not need acknowledgement.  This is because B
> can't introduce a new object bound to that ID, only A can.  Hence, any
> new_id arg for that ID is an acknowledgement of the destruction.

That's an interesting interpretation. Perhaps it works for your case,
but I would not use in regular clients and servers, because I'd like to
be able to catch ID re-use errors like libwayland does.

> However, B has to be careful to ignore messages containing that ID
> until it sees one with the ID as a new_id arg.  After the destructor
> message from B but before a subsequent new_id for that ID from A, B
> should not use the ID as arguments to other messages (and attempts to
> do so can be dropped).  And this can be automated provided the
> destructor tag can be relied on.
> 
> > 
> > The C implementation also poses an additional restriction: a new ID
> > can be at most the largest ever allocated ID + 1.
> > 
> > All this is to keep the ID map as compact as possible without a hash
> > table. These details are in the implementation of the private 'struct
> > wl_map' in libwayland.  
> 
> Obviouly, that helps middleware as well, for the same reasons.  It also
> makes more automatic error detection possible.
> 
> > ...
> > 
> > Your whole above analysis is completely correct!  
> 
> I was rather hoping things would turn out less complex than they
> seemed...
> 
> >   
> > > However, the other cases are not as easy to identify.
> > > 
> > > The other cases are:
> > > 1. an object created by a client request that has destructor events
> > > 2. an object created by the compositor
> > > 
> > > It might be true that case 1 does not exist.  Is there a general
> > > rule against that such cases would never be considered in future
> > > expansions of the Wayland protocol?
> > 
> > Destructor events do exist. Tagging them as such in the XML was not
> > done from the beginning though, it was added later in a
> > backward-compatible manner which makes the tag more informational than
> > something libwayland could automatically process. The foremost example
> > is wl_callback.done event. This is only safe because it is guaranteed
> > that the client cannot be sending a request on wl_callback at the same
> > time the server is sending 'done' and destroying the object:
> > wl_callback has no requests defined at all.  
> 
> Fortunately, my point above about the advantage of the separate ID
> ranges helps here.  If wl_callback is created by the client, then a
> wl_callback.done event tagged as a destructor does not need
> acknowledgement AND is always safe provided that messages involving the
> wl_callback ID (other than it's eventual reuse as a new_id arg) are
> ignored above libwayland.

I think there is another asymmetry here. libwayland-client definitely
ignores events on an object ID whose wl_proxy has been destroyed from
the API user point of view. libwayland-server though seems to be
throwing a protocol error immediately on any non-existing object ID
receiving a message.

I think there is a case that requires the delete_id event and cannot
work solely on the new_id ID re-use rule:
- client sends request to create wl_callback
- client destroys the wl_callback (no request, enters zombie state client side)
- client creates some other new object X

When the server destroys the wl_callback, it sends the done event, and
delete_id event, and cannot know if the client has already destroyed
the wl_callback or not.

Zombie IDs are not eligible for re-use. The zombie is destroyed and the
ID freed on delete_id event. If the client has not seen the delete_id
yet, it allocates a new high ID for object X. Otherwise, it re-uses the
wl_callback's old ID for X.

However, there is no delete_id in the opposite direction, meaning that
a similar situation in the opposite direction is racy and can lead to
confusing object IDs. I guess this is what you already found out.

Zombies are actually what allows lib

Re: wl_surface::attach(NULL) release previous buffer?

2023-09-15 Thread Pekka Paalanen
On Thu, 14 Sep 2023 12:24:42 +0100
John Cox  wrote:

> Hi
> 
> A, hopefully, simple question - should I expect a wl_buffer::release
> event from the buffer previously committed to a surface after I've
> attached (and invalidated & committed) a NULL buffer to the surface? it
> doesn't seem to happen for me (I have WAYLAND_DEBUG=1 logs showing it
> not happening).

Theoretically yes, because committing a NULL buffer should delete any
content of the surface, so there is no reason for the compositor to
keep the buffer.

It may not be immediate, though, so a simple roundtrip is not
guaranteed to be enough. An unfinished output frame may cause the
compositor to still need the previous buffer for a moment longer, and
the compositor might need to paint one more output refresh to ensure
the buffer is no longer in use.

Then there are also client related considerations like if the
wl_surface is actually a synchronized sub-surface, then committing a
NULL buffer won't take effect until the parent has been committed and
taken effect.

Another thing is that some wl_surface roles might forbid committing a
NULL buffer, but you'd notice that from a protocol error. I don't
remember what roles that might be.

> If I shouldn't expect a release - when is it safe to reuse/free the
> buffer storage?

To re-use the storage, you do need to receive wl_buffer.release, or an
equivalent if you are using an extension like explicit sync.


Thanks,
pq


pgpZRCMEthEVe.pgp
Description: OpenPGP digital signature


Re: wl_surface::attach(NULL) release previous buffer?

2023-09-15 Thread John Cox
Hi

>On Thu, 14 Sep 2023 12:24:42 +0100
>John Cox  wrote:
>
>> Hi
>> 
>> A, hopefully, simple question - should I expect a wl_buffer::release
>> event from the buffer previously committed to a surface after I've
>> attached (and invalidated & committed) a NULL buffer to the surface? it
>> doesn't seem to happen for me (I have WAYLAND_DEBUG=1 logs showing it
>> not happening).
>
>Theoretically yes, because committing a NULL buffer should delete any
>content of the surface, so there is no reason for the compositor to
>keep the buffer.
>
>It may not be immediate, though, so a simple roundtrip is not
>guaranteed to be enough. An unfinished output frame may cause the
>compositor to still need the previous buffer for a moment longer, and
>the compositor might need to paint one more output refresh to ensure
>the buffer is no longer in use.
>
>Then there are also client related considerations like if the
>wl_surface is actually a synchronized sub-surface, then committing a
>NULL buffer won't take effect until the parent has been committed and
>taken effect.
>
>Another thing is that some wl_surface roles might forbid committing a
>NULL buffer, but you'd notice that from a protocol error. I don't
>remember what roles that might be.
>
>> If I shouldn't expect a release - when is it safe to reuse/free the
>> buffer storage?
>
>To re-use the storage, you do need to receive wl_buffer.release, or an
>equivalent if you are using an extension like explicit sync.

Many thanks for your comprehensive answer. I believe that I've accounted
for all of the special cases you mention: I've waited long enough via
artificial debug delay, its on an async subplane, the window is visibly
black and it works as expected with an egl backend rether the the pixman
one. I know that our pixman backend has been patched but I wanted to
check what I was expecting before trying to raise an issue with them.

As an aside one of the more disapointing lines I read in the API docn
was "If a pending wl_buffer has been committed to more than one
wl_surface, the delivery of wl_buffer.release events becomes undefined."
Yes there may be ways of working round this but it feels wrong.

Thanks again

JC


Re: wl_surface::attach(NULL) release previous buffer?

2023-09-15 Thread Pekka Paalanen
On Fri, 15 Sep 2023 14:12:25 +0100
John Cox  wrote:

> Hi
> 
> >On Thu, 14 Sep 2023 12:24:42 +0100
> >John Cox  wrote:
> >  
> >> Hi
> >> 
> >> A, hopefully, simple question - should I expect a wl_buffer::release
> >> event from the buffer previously committed to a surface after I've
> >> attached (and invalidated & committed) a NULL buffer to the surface? it
> >> doesn't seem to happen for me (I have WAYLAND_DEBUG=1 logs showing it
> >> not happening).  
> >
> >Theoretically yes, because committing a NULL buffer should delete any
> >content of the surface, so there is no reason for the compositor to
> >keep the buffer.
> >
> >It may not be immediate, though, so a simple roundtrip is not
> >guaranteed to be enough. An unfinished output frame may cause the
> >compositor to still need the previous buffer for a moment longer, and
> >the compositor might need to paint one more output refresh to ensure
> >the buffer is no longer in use.
> >
> >Then there are also client related considerations like if the
> >wl_surface is actually a synchronized sub-surface, then committing a
> >NULL buffer won't take effect until the parent has been committed and
> >taken effect.
> >
> >Another thing is that some wl_surface roles might forbid committing a
> >NULL buffer, but you'd notice that from a protocol error. I don't
> >remember what roles that might be.
> >  
> >> If I shouldn't expect a release - when is it safe to reuse/free the
> >> buffer storage?  
> >
> >To re-use the storage, you do need to receive wl_buffer.release, or an
> >equivalent if you are using an extension like explicit sync.  
> 
> Many thanks for your comprehensive answer. I believe that I've accounted
> for all of the special cases you mention: I've waited long enough via
> artificial debug delay, its on an async subplane, the window is visibly
> black and it works as expected with an egl backend rether the the pixman
> one. I know that our pixman backend has been patched but I wanted to
> check what I was expecting before trying to raise an issue with them.

Right, sounds like a compositor bug. Also the wl_surface should not become
black, it should disappear altogether. Or maybe you mean the parent
wl_surface remains and is black, that's fine.

> As an aside one of the more disapointing lines I read in the API docn
> was "If a pending wl_buffer has been committed to more than one
> wl_surface, the delivery of wl_buffer.release events becomes undefined."
> Yes there may be ways of working round this but it feels wrong.

Right, the design simply didn't consider the multiple surfaces case.
Ideally the release event would have been a wl_callback just like
wl_surface.frame is. IIRC the explicit sync extension fixes this.


Thanks,
pq


pgpETTIQ1VveD.pgp
Description: OpenPGP digital signature


Re: wl_surface::attach(NULL) release previous buffer?

2023-09-15 Thread Simon Ser
> Ideally the release event would have been a wl_callback just like 
> wl_surface.frame is.

I sent [1] to fix this.

[1]: https://gitlab.freedesktop.org/wayland/wayland/-/merge_requests/137