Timothy B. Terriberry <[email protected]> wrote:
> I took a stab at drafting some text that answers these questions (XML diff
> attached):

Thanks; this is very helpful.  I have just a few comments:


> In order to support capturing a real-time stream that has lost
> packets, or that uses discontinuous transmission (DTX), a muxer
> SHOULD emit packets that explicitly request the use of Packet Loss
> Concealment (PLC) in place of the packets that were not transmitted.

lost or not transmitted.


> If
> there is no previous packet, reasonable decoders will not emit
> anything other than silence regardless of the mode.  Using the CELT-
> only mode for this case (with any audio bandwidth) allows maximum
> flexibility, since a single packet can represent any duration up to
> 120 ms that is a multiple of 2.5 ms using at most two bytes.

...plus one byte of Ogg lacing.

For initial zero-length frames, might it be better to prefer the
configuration of the first non-zero-length frame to the extent
possible, when available, to help in any situation where the
configuration of the first packet might be used to report
information (such as frame size), or for an initial estimate of
bandwidth, required buffer sizes, etc.?

Or perhaps the last sentence should just be omitted, since it
already effectively says that the mode, bandwidth, and channel
count are unlikely to matter to a decoder in this case.


> Delaying such
> changes as long as possible to simplifies things for PLC
> implementations.

s/to //


> A 95 ms gap could be encoded as 19 5 ms frames in
> two bytes with a single CBR code 3 packet.  If the previous frame
> size was 20 ms, using four 80 ms frames, followed by three 5 ms

s/80/20/


> frames requires 4 bytes (plus an extra byte of Ogg lacing overhead),
> but allows the PLC to use its well-tested steady state behavior for
> as long as possible.

To clarify, if the previous frame was 20 ms SILK, is this
suggesting a 4 x 20 ms SILK packet followed by a 3 x 5 ms CELT
packet?  The next paragraph suggests keeping the mode as long as
possible, implying that it may be better to use 4 x 20 ms SILK +
10 ms SILK + 5 ms CELT.  Or is minimizing the number of frame size
changes more important than keeping the mode as long as possible?


Thanks.
_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

Reply via email to