2026年3月25日(水) 12:26 Martin Thomson <[email protected]>:

> On Wed, Mar 25, 2026, at 12:16, Kazuho Oku wrote:
> > 2026年3月25日(水) 6:09 Martin Thomson <[email protected]>:
> >> On Tue, Mar 24, 2026, at 21:06, Kazuho Oku wrote:
> >> > * it avoids requiring the decoder of every frame type to support
> trial
> >> > or incremental decoding to handle truncated input;
> >>
> >> As I said, while you might simplify, but in doing so you introduce
> performance penalties that ultimately lead you back to having to handle
> incremental decoding.  That this is an option is potentially valuable and
> an argument in favor, I wouldn't weight it too much.
> >>
> >> However, frame decoding is such a small part of an implementation that
> I see this as less of a problem than you are making out.
> >>
> >> > * the overhead is identical when sending large data; and
> >>
> >> Not entirely correct, but it's very close.  When the length spans more
> (the "record" length always includes at least a STREAM frame header) there
> is a chance that the varint needs more bytes.  So it will be every so
> slightly higher overhead.
> >>
> >> > * it naturally reduces the risk of blocking caused by frames crossing
> >> > TLS record boundaries.
> >>
> >> I don't think this is right.  If you are blocking on the "record" being
> complete, any problem will be *worse*, not better if the "record" spans
> more bytes.
> >
> > Could you clarify why?
> >
> > My point is not that buffering disappears with records. Rather, my
> > point is that introducing records greatly reduces the likelihood of
> > STREAM frames spanning multiple TLS records, [...]
>
> I don't find that reasoning compelling.  If you are only sending STREAM
> frames, the same applies whether the length field is before the frame type
> or after.
>
> My reasoning is somewhat simpler.
>
> If you send X bytes, that will be split into segments.  That means that
> you will get either a complete TLS record or not, because some segments are
> missing or delayed.  At the TLS layer, it's all or nothing: you can't
> release bytes until the entire record is present.
>
> In that case, if you have a 1:1:1:1 write:frame:QMux record:TLS record
> ratio, there is zero difference between the options.


I disagree, unless you are proposing to impose a size limit on the entire
encoded STREAM frame, including the frame header bytes, which would be an
exotic design compared to having a size limit on the STREAM frame payload.

And if you are indeed considering a design that bounds the entire encoded
frame image, I am not sure how that is materially different from simply
prefixing the length, i.e. using the record approach, which also helps
avoid TLS-record-boundary crossing when multiple frames are sent.

Otherwise, I think we are largely on the same page: omitting the record
layer nudges implementations toward incremental or stateful processing,
whereas the record-based approach relies on the expectation that
QMux-record-sized writes and TLS-record-sized writes will often align in
practice.

  Where things break down is on the last pairing.
>
> If you have QMux records split across TLS records, if you are going to
> realize the benefits you are touting, you need to wait until the next TLS
> record is present.  However, if you are willing to incrementally process,
> you can process the partial STREAM frame in either model.  The record-based
> model, which encourages implementations to wait for entire records, remains
> blocked if they take that option.
>
> In the version without QMux records, you can also process the
> WINDOW_UPDATE frames or whatever control frames are sent ahead of the
> record split.  You more or less have to without records.  In the
> record-based version, implementations will block, because that's easier to
> implement, so no performance advantage.  The bytes are there, but they
> won't bother reading them off because that requires incremental frame
> decoding.
>
> I acknowledge that this is based on an assumption that the easier
> implementation option will be taken.  And the cost of that might not be
> realized.  Maybe the happy path is what most people experience.  But this
> benefit applies to the non-record-based version as well.
>
> Of course, one thing you *can* bet on is that performance bugs that can
> happen, will happen.  Murphy has a lot of relevance in that domain.
>
> > This in turn means that whenever a QMux-over-TLS receiver decrypts a
> > TLS record, a complete QMux record becomes available.
>
> That's not a guarantee, it's just a likely outcome.
>
> > As a result, additional latency due to QMux-layer buffering is avoided,
> > even if the receiver does not implement trial or incremental processing.
>
> My point is that the additional latency is all in the QMux record layer,
> because it encourages the easier implementation.  Either model can take
> advantage of the approach you describe equally.  So yes, that argument
> doesn't work for me.
>


-- 
Kazuho Oku

Reply via email to