2026年3月25日(水) 12:26 Martin Thomson <[email protected]>: > On Wed, Mar 25, 2026, at 12:16, Kazuho Oku wrote: > > 2026年3月25日(水) 6:09 Martin Thomson <[email protected]>: > >> On Tue, Mar 24, 2026, at 21:06, Kazuho Oku wrote: > >> > * it avoids requiring the decoder of every frame type to support > trial > >> > or incremental decoding to handle truncated input; > >> > >> As I said, while you might simplify, but in doing so you introduce > performance penalties that ultimately lead you back to having to handle > incremental decoding. That this is an option is potentially valuable and > an argument in favor, I wouldn't weight it too much. > >> > >> However, frame decoding is such a small part of an implementation that > I see this as less of a problem than you are making out. > >> > >> > * the overhead is identical when sending large data; and > >> > >> Not entirely correct, but it's very close. When the length spans more > (the "record" length always includes at least a STREAM frame header) there > is a chance that the varint needs more bytes. So it will be every so > slightly higher overhead. > >> > >> > * it naturally reduces the risk of blocking caused by frames crossing > >> > TLS record boundaries. > >> > >> I don't think this is right. If you are blocking on the "record" being > complete, any problem will be *worse*, not better if the "record" spans > more bytes. > > > > Could you clarify why? > > > > My point is not that buffering disappears with records. Rather, my > > point is that introducing records greatly reduces the likelihood of > > STREAM frames spanning multiple TLS records, [...] > > I don't find that reasoning compelling. If you are only sending STREAM > frames, the same applies whether the length field is before the frame type > or after. > > My reasoning is somewhat simpler. > > If you send X bytes, that will be split into segments. That means that > you will get either a complete TLS record or not, because some segments are > missing or delayed. At the TLS layer, it's all or nothing: you can't > release bytes until the entire record is present. > > In that case, if you have a 1:1:1:1 write:frame:QMux record:TLS record > ratio, there is zero difference between the options.
I disagree, unless you are proposing to impose a size limit on the entire encoded STREAM frame, including the frame header bytes, which would be an exotic design compared to having a size limit on the STREAM frame payload. And if you are indeed considering a design that bounds the entire encoded frame image, I am not sure how that is materially different from simply prefixing the length, i.e. using the record approach, which also helps avoid TLS-record-boundary crossing when multiple frames are sent. Otherwise, I think we are largely on the same page: omitting the record layer nudges implementations toward incremental or stateful processing, whereas the record-based approach relies on the expectation that QMux-record-sized writes and TLS-record-sized writes will often align in practice. Where things break down is on the last pairing. > > If you have QMux records split across TLS records, if you are going to > realize the benefits you are touting, you need to wait until the next TLS > record is present. However, if you are willing to incrementally process, > you can process the partial STREAM frame in either model. The record-based > model, which encourages implementations to wait for entire records, remains > blocked if they take that option. > > In the version without QMux records, you can also process the > WINDOW_UPDATE frames or whatever control frames are sent ahead of the > record split. You more or less have to without records. In the > record-based version, implementations will block, because that's easier to > implement, so no performance advantage. The bytes are there, but they > won't bother reading them off because that requires incremental frame > decoding. > > I acknowledge that this is based on an assumption that the easier > implementation option will be taken. And the cost of that might not be > realized. Maybe the happy path is what most people experience. But this > benefit applies to the non-record-based version as well. > > Of course, one thing you *can* bet on is that performance bugs that can > happen, will happen. Murphy has a lot of relevance in that domain. > > > This in turn means that whenever a QMux-over-TLS receiver decrypts a > > TLS record, a complete QMux record becomes available. > > That's not a guarantee, it's just a likely outcome. > > > As a result, additional latency due to QMux-layer buffering is avoided, > > even if the receiver does not implement trial or incremental processing. > > My point is that the additional latency is all in the QMux record layer, > because it encourages the easier implementation. Either model can take > advantage of the approach you describe equally. So yes, that argument > doesn't work for me. > -- Kazuho Oku
