On Mon, 18 Sep 2023 14:06:51 +0300 Pekka Paalanen <ppaala...@gmail.com> wrote:
> On Sat, 16 Sep 2023 12:18:35 -0400 > jleivent <jleiv...@comcast.net> wrote: > > > The easiest fix I can think of is to go full-on half duplex. > > Meaning that each side doesn't send a single message until it has > > fully processed all messages sent to it in the order they arrive > > (thankfully, sockets preserve message order, else this would be > > much harder). Have you considered half duplex? > > Never crossed my mind at least. I can't even imagine how it could be > implemented through a socket, because both sides must be able to > spontaneously send a message at any time. By taking turns. Each side would, after queuing up a batch of messages, add an "Over!" message (from the days of half-duplex radio communications) to the end of that queue, and then send the whole queue (retaining its sequence). Neither side would send a message until it receives the other side's "Over!" message, and until the higher levels above libwayland have had a chance to examine all messages prior to "Over!" in order to avoid sending an inconsistent message or even committing to a state incompatible with later messages. > > > Certainly, it would mean a loss > > of some concurrency, hence a potential performance hit. But > > probably not that much in this case, as most of the message > > back-and-forth in Wayland occurs at user-interaction speeds, while > > the speed-needing stuff happens through fd sharing and similar > > things outside the protocol. I > > That user interaction speed can be in the order of a kilohertz, for > gaming mice, at least in one direction. In the other direction, > surface update rate is also unlimited, games may want to push out > frames even if only every tenth gets displayed to reduce latency. > Also truly tearing screen updates are being developed. But aren't those fast frame updates done through shared fds? Hence not part of the wire protocol, and would not be impacted by increasing the length of messages on the wire? > > > think it can be made mostly backward compatible. It would probably > > require some "all done" interaction between libwayland and higher > > levels on each side, but that's probably (hopefully) not too hard. > > There may even be a way to automate the "all done" interaction to > > make this fully backward compatible, because libwayland knows when > > there are no more messages to be processed on the wire, and it can > > queue-up the messages on each side before placing them on the wire. > > It might need to do things like re-order ping/pong messages with > > respect to the others to make sure the pinging side (compositor) > > doesn't declare the client dead while waiting. But that seems > > minor, as long as all such ping/pong pairs are opaque to the > > remainder of the protocol, hence always commute with other > > messages. > > If you mean adding new ping/pong stuff, that doesn't sound very nice, > because Wayland also aims to be power efficient: if truly nothing is > happening, let the processes sleep. Anyone could still wake up any > time, and send a message. Not adding. Dealing with the already existing (or if any new ones are added) ping/pong pairs. Or any messages that really need to be timely, hence can't wait for messages in front of them to be fully processed. That could apply to any real-time requirements, like the gaming mice messages you mentioned above. But doing this in general is hard unless the messages are irrelevant to the rest of the protocol (hence commute with everything else), like ping/pong are. > > > On Sun, 17 Sep 2023 15:28:04 -0400 > jleivent <jleiv...@comcast.net> wrote: > > > Has altering the wire format to contain all the info needed for > > unambiguous decoding of each message entirely within libwayland > > without needing to know the object ID -> type mapping been > > considered? > > Not that I can recall. The wire format is ABI, libwayland is not the > only implementation of it, so that would be Wayland 2 material. So no changes to the wire format are possible under any circumstances in Wayland 1? > > > It would make the messages longer, but this seems like it wouldn't > > be very bad for performance because wire message transfer is roughly > > aligned with user interaction speeds. > > We need to be able to deal with at least a few thousand messages per > second easily. > > The overhead seems a bit bad if every message would need to carry its > signature. Encoding more into the message is only needed if there are no destructor request acks (the equivalent of wl_display::delete_id, but in the opposite direction). But I was wondering why not do it for robustness. The signature isn't very big, but it's probably not needed even for robustness. What's needed is the target object type/version information. Since from that both sides know the signature. The issue is just how to add robustness to the object ID -> type/version mapping, which is the source of many problems. The signatures are not ambiguous after that's done. The target type/version info could be encoded to be small if you want - if both sides agree on an indexing scheme pairing numbers with type names & versions, like they already do for opcodes. It's hard to imagine it taking more than 2 bytes (leaving you room for 2^16 type/version combos), but 4 bytes is certainly plenty. If you don't have destructor request acks, you may also need "generation" numbers. There's a problem about object IDs we haven't spoken about yet, related mostly to the hard case with server-side IDs and server-side destructor events. It's possible (I don't know if there are instances in the current protocol) for a series of object creations (new_id args) and destructions all originating from the server and all involving the same object ID such that the server might receive a request from the client involving that object ID and agree with the decoding about the type of the object, but not the generation of that object. These steps: server sends: event E1 with new_id N for type T (call the new object O1) ... destructor for id N ... event E2 with new_id N for type T (call the new object O2) at some later point, the server receives: request R involving id N (either as target or object arg) which matches type T based on decoding the request The server cannot tell, even if it has all of the decode info available (whether encoded in the wire, or because the request R gets decoded without issue otherwise), whether the object with id N in R refers to O1 or O2. If it refers to O1, the message is what I called "speculative" because it was sent prior to the client seeing the destructor for id N above. That message could be a mistake if processed, because the current state of the compositor has O2 with id N, and that might be a completely distinct usage of object type T that isn't at all related to request R. A generation number would keep track of which particular instance (generation) of the object with id N and type T a message is referring to. A generation number would be associated with each id N, and would get incremented each time N got re-used. It would be added to the wire protocol for both the target objects and the object arguments. That's more (probably 4-byte) fields. Again, note that this is ambiguity between O1 and O2 cannot occur if there is some destructor ack request that the client can send and libwayland on the server side understand (the equivalent of the wl_display::delete_id, but going the other way). I suspect that's better to add to the protocol instead of generation numbers. It also cannot occur in the half-duplex scheme, which is a more complete solution to many problems. I read the Protocol-next issues - so I see you've thought about delete_id requests already. > > > Also, for any compositor/client pair, as long as they both use the > > same version of libwayland, the necessary wire format change would > > not result in compatibility issues. It would for static linked > > cases, or similar mismatching cases (flatpak, appimage, snap, etc. > > unless the host version is mapped in instead of the packaged one > > somehow). There also seem to be unused bits in the existing wire > > format so that one could detect an a compositor/client > > incompatibility at least on one end. > > We've never had the requirement for compositor and clients to use the > same minor version of libwayland. There are also completely > independent Wayland implementations in other languages that expect to > be interoperable. Breaking all that seems unacceptable. > > What unused bits did you find? For instance, the length field in the header. It seems unlikely that you need all 16 bits. And, currently, you have hard-coded 4K ring buffers in libwayland, so you have a hard limit on 4K (12 bits) for max message size. Unless I'm missing some code to handle that somewhere. I think there is a check such that if a message in the wire claims to have a length too big for your 4K ring buffer, that's an error. > > > I'm not suggesting that unambiguous decoding of all messages is a > > sufficient fix, but it is a necessary one. There are still > > speculative computation issues that it wouldn't resolve alone. > > I didn't understand what is speculative. There is no roll-back of any > kind on anything, what's computed is final. Right - you have speculation without roll-back, meaning you have broken speculation! One case of speculation we discussed already is a request containing new_ID args, where the target object has already been deleted by the server. That request was speculative, since it was sent by the client prior to it seeing the already-processed-on-the-server destruction of the target object. The client "speculated" that the target object will still exist to receive the request, and did so incorrectly. Although this type of speculation is addressed (partially) by the addition of a delete_id request, at least with respect to disambuating message decoding, other cases are not. My reason for suggesting going half-duplex was about trying to prevent speculation AND message ambiguity together with one fix. There may be cases of speculation unrelated to object destruction. Consider: Side A sends message M1 then message M2 (or at least did other state-change things after sending M1). Side B processes M1 (but doesn't yet see M2) and sends message M3 which is not compatible with what side A did after sending M1. But side A doesn't know based only on its own state that it should ignore message M3. And even if it does ignore M3, side B may have done processing after sending M3 involving change its state, such that it's new state is incompatible with side A ignoring message M3. tl;dr: protocol asynchrony leads to speculation that can result in the two sides disagreeing about the correct state of the world. BTW - I don't recommend adding rollbacks! It's a huge can of worms unless the speculation is very limited (as with machine instruction speculation in modern CPUs). You can prevent speculation for each individual case within the protocol by having the protocol include acks, mutexes or transactional boundary messages specific to these cases. Or you could design the protocol so that such M3 messages (and other sending-side state changes) must always be compatible (commute with) with the receiving state - that's probably very hard. Or, you could go half-duplex, and require that neither side "commits" to any state changes, including sending messages, until it has processed ALL messages prior to the "Over!" message that was sent by the other side. That prevents speculation by taking all client-vs.-server asynchrony out of the protocol. This half-duplex idea, as draconian as it sounds, has one key point in its favor: it's the easiest to get right. Or, you can hope that cases of speculation don't arise. Or that when they do, the resulting mistakes aren't too severe.