Re: [codec] Review of draft-ietf-codec-oggopus-08

Timothy B. Terriberry Tue, 17 Nov 2015 17:42:05 -0800

Ralph Giles wrote:

Mark suggested 'an implementation (of this specification)' as a way to
disabiguate.

I've never been particularly happy with our usage of encoder/decoder(without a definition, even, though to be clear I'm responsible for thecurrent state of this text, since I wrote it). Mark's suggestion soundslike a solid improvement to me.

I don't think there's anything here which should update RFC 6716, given
the decision to make that document just about the audio compression
layer. Drafts describing encapsulation necessarily contain additional
constraints and techiniques on how to use an RFC 6716 encoder/decoder
for their particular application.

I agree here. RFC 6716 describes a "one frame in, one frame out"encoder/decoder. A complete encoder application has to deal with audioat the sample level, and potentially with multiple streams, which is thesource of most of the additional requirements.

I think it's useful to include some motivation for why a draft is
interesting in the draft in the abstract. The overview of Ogg's features
is really just defining terms for the last paragraph. I'll see if I can
come up with something better.

And in particular the reason I added them was to motivate why someonemight need this draft over and above RFC 6716. But I've no objection tocutting them if you don't manage to come up with something better.

- 4th paragraph: "The second packet in the logical Ogg bitstream
MUST contain the comment header" ... then later ... "It MAY span
one or more pages" ... so shouldn't this be MUST not MAY since zero
is not allowed, i.e. the comment header is mandatory not optional.


How about, "It MAY span multiple pages, beginning on the second page of
the logical stream." Avoids the math/English ambiguity.

That captures the intent (i.e., that the header *might* span more thanone page). Works for me.

Saying "Muxers MUST to X. Demuxers MUST handle streams not compliant
with that MUST," is confusing without offering better precision.

I agree. C.f.,<https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00>, I don'tthink we should add a muxer MUST unless we're prepared to have demuxersactually fail on said streams.

Section 4.1. Repairing Gaps in Real-time Streams - 1st paragraph:
"a muxer SHOULD emit packets that explicitly request the use of
Packet Loss Concealment (PLC) in place of the missing packets." Why
SHOULD not MUST?


IIRC Tim wanted to allow implementations to emit zero-length packets
instead, updating the granulepos without generating packet data to
request PLC from the decoder.

No, it's impossible to compute the duration of a zero-byte packet, sinceit has no TOC sequence, and this breaks lots of things. The "SHOULD" wasbecause there are other alternatives which generate valid streams, butless than optimal results, such as simply not incrementing the timestampfor the lost packets. That leads to desync if there's video, but simply(hopefully) small audio glitches and skips for audio-only applications,which might be an acceptable code/complexity trade-off for someapplications. Another possibility might be to actually re-encode theaudio that was played out (so that decoding does not depend on having agood PLC implementation, but matches what was heard during the call).That's more complex to get right than just relying on decoder PLC, but Idon't see a reason to have a MUST that would disallow it.

A muxer following the SHOULD benefits naive players which feed packets
spanning the gap to the decoder blindly. More sophisticated players
would implement their own version of this muxer SHOULD to handle the gap
when the muxer did not.

That makes computing per-packet timestamps then becomes an ill-posedproblem, which is not something we should require demuxers to solve. Ithink the document is clear elsewhere that there must be no gaps in thetimestamps and that packets must be at least one byte, and nothing herewas meant to relax those requirements.

The prior paragraph uses MUST and says "For this to work, there
cannot be any gaps." If you want to keep it as SHOULD strength,
perhaps state that muxers which fail to do this will cause
demuxers to compute incorrect granule positions when seeking
forward or backward.

Again, violating this SHOULD does not mean you are allowed to muxpackets with gaps in the timestamps. Perhaps we need to add a MUST NOTto reinforce the constraints laid out elsewhere in the document, sincethere seems to be some misunderstanding here.

- I assume this section does not apply to RFC 7587 (Opus RTP)
because in the RTP case, RTP timestamp signals the gap (if using
DTX), while Ogg uses the granule position. Correct? If not, should
any of this section apply to RFC 7587 and therefore update it?


Correct. Ogg does not allow discontinuous transmission (or rather

It does, but traditionally only for things like subtitle streams, notaudio, and the encoded packet data needs to contain enough informationto unambiguously recover per-packet timestamps. RTP includes per-packetsequence number and timestamp values in order to more easily recoverfrom losses, and pays a 6-byte per packet overhead to do so. In Ogg theunit of loss recovery is the page, and there is only one timestamp(granule position) per page, which allows the per-packet overhead to bevery close to one byte. But that's what leads to the need to be explicitabout per-packet losses.

Section 4.2. Pre-skip - Same as above, does it apply to RFC 7587?


RTP applications typically start playback with the first audio packet
received to minimize latency. I suppose the summary of encoder delay
could still be useful in that context, but there's no signalling
mechanism for the pre-skip field in RTP, and the primary use of
marking sample-accurate edit points isn't relevant there.

Right, I'm not aware of any other codec in RTP that definessample-accurate trimming. If we did decide it was a problem worthsolving in RTP, we should solve it generically, instead of doingsomething Opus-specific. Sample accurate trimming *is* a featureavailable to other codecs in Ogg (and was one of the original sellingpoints of Vorbis over MP3), but because granule position iscodec-specific in Ogg, it requires an Opus-specific definition in thisdocument.

Also in the next paragraph. - Last sentence: "However,
implementations MAY reject streams in which the ID header does not
complete on the first page." Seems like this should be MUST not MAY
based on section 3 which clearly requires this: "MUST complete on
that [first] page." Or perhaps you need to specify muxer MUST vs.
demuxer MAY.


This one is clearly "demuxer MAY".

I believe the intent of the MAY was to clarify that the preceding"...MUST NOT reject it for containing additional data..." did not givemuxers carte blanche to add an unbounded amount of additional data. Iwould be okay making the RFC 2119 keyword stronger, but it shouldn't bestronger than the "..SHOULD reject ID headers which do not containenough data..." earlier in the paragraph (upgrading both to MUSTwouldn't be unreasonable).

Coupled Stream Count: "...the first M Opus decoders are to be
initialized for stereo output..." Is this an intended restriction
that all stereo channels must appear first before any mono
channels?


Yes. That way we don't have to signal mono/stereo for each substream,
just how many of each we have.

In practice, the channel mapping can reorder the channels arbitrarilyafter decoding, so it doesn't actually impose any restrictions on whatcan be coupled with what.

Can we cite both RFC 6381 (for the format) and RFC 5334 (for
ogg-specifics) here?


That sounds reasonable to me.

_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

Re: [codec] Review of draft-ietf-codec-oggopus-08

Reply via email to