Ralph Giles wrote:
Mark suggested 'an implementation (of this specification)' as a way to
disabiguate.
I've never been particularly happy with our usage of encoder/decoder
(without a definition, even, though to be clear I'm responsible for the
current state of this text, since I wrote it). Mark's suggestion sounds
like a solid improvement to me.
I don't think there's anything here which should update RFC 6716, given
the decision to make that document just about the audio compression
layer. Drafts describing encapsulation necessarily contain additional
constraints and techiniques on how to use an RFC 6716 encoder/decoder
for their particular application.
I agree here. RFC 6716 describes a "one frame in, one frame out"
encoder/decoder. A complete encoder application has to deal with audio
at the sample level, and potentially with multiple streams, which is the
source of most of the additional requirements.
I think it's useful to include some motivation for why a draft is
interesting in the draft in the abstract. The overview of Ogg's features
is really just defining terms for the last paragraph. I'll see if I can
come up with something better.
And in particular the reason I added them was to motivate why someone
might need this draft over and above RFC 6716. But I've no objection to
cutting them if you don't manage to come up with something better.
- 4th paragraph: "The second packet in the logical Ogg bitstream
MUST contain the comment header" ... then later ... "It MAY span
one or more pages" ... so shouldn't this be MUST not MAY since zero
is not allowed, i.e. the comment header is mandatory not optional.
How about, "It MAY span multiple pages, beginning on the second page of
the logical stream." Avoids the math/English ambiguity.
That captures the intent (i.e., that the header *might* span more than
one page). Works for me.
Saying "Muxers MUST to X. Demuxers MUST handle streams not compliant
with that MUST," is confusing without offering better precision.
I agree. C.f.,
<https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00>, I don't
think we should add a muxer MUST unless we're prepared to have demuxers
actually fail on said streams.
Section 4.1. Repairing Gaps in Real-time Streams - 1st paragraph:
"a muxer SHOULD emit packets that explicitly request the use of
Packet Loss Concealment (PLC) in place of the missing packets." Why
SHOULD not MUST?
IIRC Tim wanted to allow implementations to emit zero-length packets
instead, updating the granulepos without generating packet data to
request PLC from the decoder.
No, it's impossible to compute the duration of a zero-byte packet, since
it has no TOC sequence, and this breaks lots of things. The "SHOULD" was
because there are other alternatives which generate valid streams, but
less than optimal results, such as simply not incrementing the timestamp
for the lost packets. That leads to desync if there's video, but simply
(hopefully) small audio glitches and skips for audio-only applications,
which might be an acceptable code/complexity trade-off for some
applications. Another possibility might be to actually re-encode the
audio that was played out (so that decoding does not depend on having a
good PLC implementation, but matches what was heard during the call).
That's more complex to get right than just relying on decoder PLC, but I
don't see a reason to have a MUST that would disallow it.
A muxer following the SHOULD benefits naive players which feed packets
spanning the gap to the decoder blindly. More sophisticated players
would implement their own version of this muxer SHOULD to handle the gap
when the muxer did not.
That makes computing per-packet timestamps then becomes an ill-posed
problem, which is not something we should require demuxers to solve. I
think the document is clear elsewhere that there must be no gaps in the
timestamps and that packets must be at least one byte, and nothing here
was meant to relax those requirements.
The prior paragraph uses MUST and says "For this to work, there
cannot be any gaps." If you want to keep it as SHOULD strength,
perhaps state that muxers which fail to do this will cause
demuxers to compute incorrect granule positions when seeking
forward or backward.
Again, violating this SHOULD does not mean you are allowed to mux
packets with gaps in the timestamps. Perhaps we need to add a MUST NOT
to reinforce the constraints laid out elsewhere in the document, since
there seems to be some misunderstanding here.
- I assume this section does not apply to RFC 7587 (Opus RTP)
because in the RTP case, RTP timestamp signals the gap (if using
DTX), while Ogg uses the granule position. Correct? If not, should
any of this section apply to RFC 7587 and therefore update it?
Correct. Ogg does not allow discontinuous transmission (or rather
It does, but traditionally only for things like subtitle streams, not
audio, and the encoded packet data needs to contain enough information
to unambiguously recover per-packet timestamps. RTP includes per-packet
sequence number and timestamp values in order to more easily recover
from losses, and pays a 6-byte per packet overhead to do so. In Ogg the
unit of loss recovery is the page, and there is only one timestamp
(granule position) per page, which allows the per-packet overhead to be
very close to one byte. But that's what leads to the need to be explicit
about per-packet losses.
Section 4.2. Pre-skip - Same as above, does it apply to RFC 7587?
RTP applications typically start playback with the first audio packet
received to minimize latency. I suppose the summary of encoder delay
could still be useful in that context, but there's no signalling
mechanism for the pre-skip field in RTP, and the primary use of
marking sample-accurate edit points isn't relevant there.
Right, I'm not aware of any other codec in RTP that defines
sample-accurate trimming. If we did decide it was a problem worth
solving in RTP, we should solve it generically, instead of doing
something Opus-specific. Sample accurate trimming *is* a feature
available to other codecs in Ogg (and was one of the original selling
points of Vorbis over MP3), but because granule position is
codec-specific in Ogg, it requires an Opus-specific definition in this
document.
Also in the next paragraph. - Last sentence: "However,
implementations MAY reject streams in which the ID header does not
complete on the first page." Seems like this should be MUST not MAY
based on section 3 which clearly requires this: "MUST complete on
that [first] page." Or perhaps you need to specify muxer MUST vs.
demuxer MAY.
This one is clearly "demuxer MAY".
I believe the intent of the MAY was to clarify that the preceding
"...MUST NOT reject it for containing additional data..." did not give
muxers carte blanche to add an unbounded amount of additional data. I
would be okay making the RFC 2119 keyword stronger, but it shouldn't be
stronger than the "..SHOULD reject ID headers which do not contain
enough data..." earlier in the paragraph (upgrading both to MUST
wouldn't be unreasonable).
Coupled Stream Count: "...the first M Opus decoders are to be
initialized for stereo output..." Is this an intended restriction
that all stereo channels must appear first before any mono
channels?
Yes. That way we don't have to signal mono/stereo for each substream,
just how many of each we have.
In practice, the channel mapping can reorder the channels arbitrarily
after decoding, so it doesn't actually impose any restrictions on what
can be coupled with what.
Can we cite both RFC 6381 (for the format) and RFC 5334 (for
ogg-specifics) here?
That sounds reasonable to me.
_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec