Re: [codec] draft-ietf-codec-oggopus-07 packet sizes

Mark Harris Wed, 13 May 2015 00:45:13 -0700

Timothy B. Terriberry wrote:
> Mark Harris wrote:
>>
>>     An Ogg Opus player MUST play any valid Ogg Opus stream with a
>>     channel mapping family of 0 or 1 that contains no packet larger
>>     than 61,440 octets, even if the number of channels does not match
>>     the physically connected audio hardware.
>
>
> This should probably forward-reference Section 6, i.e., simply by adding a
> "(See Section 6)" after "61,440 octets."


Sounds good.

> Also, I assume you meant this
> restriction to only apply to audio data packets (in keeping with your new
> text in Section 6)?

Actually my proposed text in Section 6 permits rejecting or partial
processing of any packet larger than 61,440 octets (including a
comment packet), although I'm certainly open to other suggestions.  It
was unclear whether the previous text applied to a comment packet, but
if the intent is to allow implementations with a fixed-size packet
buffer then it seems reasonable that the same buffer would be used for
all packets and that the same limit should apply.

>
>> In Section 6 replace:
>>
>>     Technically, valid Opus packets can be arbitrarily large due to the
>>     padding format, although the amount of non-padding data they can
>>     contain is bounded.  These packets might be spread over a similarly
>>     enormous number of Ogg pages.  Encoders SHOULD use no more padding
>>     than is necessary to make a variable bitrate (VBR) stream constant
>>     bitrate (CBR).  Decoders SHOULD avoid attempting to allocate
>>     excessive amounts of memory when presented with a very large packet.
>>     Decoders SHOULD reject packets larger than 60 kB per channel, and
>>     display a warning message, and MAY reject packets larger than 7.5 kB
>>     per channel.  The presence of an extremely large packet in the stream
>>     could indicate a memory exhaustion attack or stream corruption.
>>
>> with:
>>
>>     Technically, valid Opus packets can be arbitrarily large due to
>>     padding, although the amount of non-padding data that an audio data
>>     packet can contain is bounded.  These packets might be spread over
>>     a similarly enormous number of Ogg pages.  In an audio data packet,
>>     encoders SHOULD use no more padding than is necessary to make a
>>     variable bitrate (VBR) stream constant bitrate (CBR).  Decoders
>
>
> Talking about doing something in a single audio data packet that affects an
> entire stream seems a little tortured. How about, "Encoders SHOULD pad audio
> data packets no more than necessary to make a variable bitrate (VBR) stream
> constant bitrate (CBR)"?

I'm not real happy with that wording, which seems as though it could
be misinterpreted as saying that encoders SHOULD pad audio data
packets to make a VBR stream CBR, but not use any more padding than
needed for that.  The intent was to just add a clause to the previous
text to make it clear that it is not restricting the use of other
kinds of padding, such as padding at the end of the comment packet in
a stream that is already CBR.  How about: "Encoders SHOULD limit the
use of padding in audio data packets to no more than is necessary to
make a variable bitrate (VBR) stream constant bitrate (CBR)"?

>
>>     SHOULD reject audio data packets larger than 61,440 octets per
>>     stream; such packets necessarily contain more padding than needed
>>     for this purpose.  Decoders SHOULD avoid attempting to allocate
>>     excessive amounts of memory when presented with a very large
>>     packet, and MAY reject or partially process any packet larger
>>     than 61,440 octets.  The presence of an extremely large packet
>
>
> This limit is reasonable for channel mapping families 0 and 1 (7.5 kB per
> stream for each of a maximum of 8 non-useless streams), but not for channel
> mapping family 255, which may carry many more channels (and is currently the
> only way to do so). Suggest, "...and MAY reject or partially process any
> packet larger than 61,440 octets when used with channel mapping family 1, or
> a packet larger than 7,680 octets per stream when used with channel mapping
> family 255."

Does the 7,680 octets per stream apply to the comment packet?

It is my impression that no decoders are currently required to process
any packets at all from a stream with channel mapping 255.  If that is
not the intention, then perhaps this is what should be clarified.  Is
it the act of starting to process a stream with mapping family 255
that obligates a decoder to process the remaining packets?  That is,
is it your intention that a decoder implementation that can handle
packets up to 1 MB be required to immediately reject and not even
attempt to process a stream with channel mapping 255 and 137 Opus
streams, because it is possible (although unlikely) that it may
contain a packet larger than 1 MB that it would be required to
process?  The text that I proposed was intended to allow such an
implementation to successfully process a stream that does not actually
exceed the implementation's packet size limit.

>
> We could also put in a smaller MAY limit for channel mapping family 0, but I
> don't think that's that useful, as I suspect most players will a) only
> support mapping families 0 and 1, and b) use a fixed-size buffer independent
> of the file they're decoding.

Also, even in mapping family 0 (or mapping family 255 with 1 stream)
it is reasonable to have a comment packet with album cover art.

>
>
>>     in the stream could indicate a memory exhaustion attack or stream
>>     corruption.
>>
>> Replace:
>>
>>     In an Ogg Opus stream, the largest possible valid packet that does
>>     not use padding has a size of (61,298*N - 2) octets, or about 60 kB
>>     per Opus stream.  With 255 streams, this is 15,630,988 octets
>>     (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which
>>     will have a granule position of -1.  This is of course a very extreme
>>     packet, consisting of 255 streams, each containing 120 ms of audio
>>     encoded as 2.5 ms frames, each frame using the maximum possible
>>     number of octets (1275) and stored in the least efficient manner
>>     allowed (a VBR code 3 Opus packet).  Even in such a packet, most of
>>     the data will be zeros as 2.5 ms frames cannot actually use all
>>     1275 octets.  The largest packet consisting of entirely useful data
>>     is (15,326*N - 2) octets, or about 15 kB per stream.  This
>>     corresponds to 120 ms of audio encoded as 10 ms frames in either SILK
>>     or Hybrid mode, but at a data rate of over 1 Mbps, which makes little
>>     sense for the quality achieved.  A more reasonable limit is
>>     (7,664*N - 2) octets, or about 7.5 kB per stream.  This corresponds
>>     to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with a
>>     total bitrate just under 511 kbps (not counting the Ogg encapsulation
>>     overhead).  With N=8, the maximum number of channels currently
>>     defined by mapping family 1, this gives a maximum packet size of
>>     61,310 octets, or just under 60 kB.  This is still quite
>>     conservative, as it assumes each output channel is taken from one
>>     decoded channel of a stereo packet.  An implementation could
>>     reasonably choose any of these numbers for its internal limits.
>>
>> with:
>>
>>     In an Ogg Opus stream, the largest possible valid audio data packet
>>     that does not use padding has a size of (61,298*N - 2) octets,
>>     although a valid comment header packet may be larger.  With 255 Opus
>>     streams, this is 15,630,988 octets and can span up to 61,298 Ogg
>>     pages, all but one of which will have a granule position of -1.
>>     This is of course a very extreme packet, consisting of 255 Opus
>>     streams, each containing 120 ms of audio encoded as 2.5 ms frames,
>>     each frame using the maximum possible number of octets (1275) and
>>     stored in the least efficient manner allowed (a VBR code 3 Opus
>>     packet).  Even in such a packet, most of the data will be zeros as
>>     2.5 ms frames cannot actually use all 1275 octets.
>>
>>     The largest audio data packet consisting of entirely useful data is
>>     (15,326*N - 2) octets.  This corresponds to 120 ms of audio encoded
>>     as 10 ms frames in either SILK or Hybrid mode, but at a data rate
>>     of over 1 Mbps, which makes little sense for the quality achieved.
>>
>>     A more reasonable limit is (7,664*N - 2) octets.  This corresponds
>>     to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with
>>     a total bitrate just under 511 kbps (not counting the Ogg
>>     encapsulation overhead).  For mapping family 1, N=8 provides a
>>     reasonable upper bound, as it allows for each of the 8 possible
>>     output channels to be encoded in a separate stereo Opus stream.
>
>
> Suggest "decoded from" instead of "encoded in" to make this sound more
> reasonable.

Ok.

>
>>     This gives a packet size of 61,310 octets, which is rounded up to a
>>     multiple of 1024 octets to give the packet size of 61,440 octets
>>     that is expected to be successfully processed by any implementation.
>
>
> Just wordsmithing, how about: "This gives a limit of 61,310 octets, rounded
> up to a multiple of 1024 to yield the maximum audio data packet size of
> 61,440 octets that any implementation is expected to be able to process
> successfully."

This is showing the close connection between a reasonable maximum
packet size for an audio data packet in a stream with mapping family 1
(61,310 octets using the above definition of reasonable) and the
packet size given earlier (for any kind of packet) that any conforming
decoder is expected to be able to process (61,440 octets).  At least
from the point of view of decoder requirements, the 61,440 octet
number is more of a minimum (minimum packet buffer size) than a
maximum.  The purpose of making the connection is to help a decoder
implementor make a better informed decision on whether to support more
than the minimum requirement, and help an encoder implementor
determine how much concern they should have about producing a stream
that may not play as intended on all conforming decoders.  Because
they depend on the point of view I avoided using words like limit,
minimum, and maximum here, but perhaps that made the wording awkward.

 - Mark

_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

Re: [codec] draft-ietf-codec-oggopus-07 packet sizes

Reply via email to