[codec] draft-ietf-codec-oggopus-07 packet sizes

Mark Harris Mon, 04 May 2015 19:46:18 -0700

draft-ietf-codec-oggopus-07 adds some concrete limits to the packet
sizes that must be accepted, which I agree is a good idea, however
there are some issues with the specific text.


Section 6 states:
"Decoders SHOULD reject packets larger than 60 kB per channel, and
display a warning message, and MAY reject packets larger than 7.5 kB
per channel."

It is unclear whether "channel" here refers to output channels,
decoded channels, or physical channels (although output channels makes
the most sense).  Also, while the next paragraph appears to be a
justification for these numbers, the numbers in that paragraph are per
stream and not per channel.  The number of output channels may be
higher or lower than the number of streams so the connection is
unclear.  Additionally it is unclear whether this section applies only
to audio data packets or also to comment packets, which do not have
channels or streams but could be larger than audio data packets.

It is also unclear how this interacts with the Section 5.1.1.5
requirement that an Ogg Opus player play *any* Ogg Opus stream with a
channel mapping family of 0 or 1 (apparently including those with
large packets).  In order to comply with that, does that mean that
large packets may only be rejected when the channel mapping family is
not 0 or 1, or by applications that are not an "Ogg Opus player"?

It appears that the intent is to require Ogg Opus players to play any
valid Ogg Opus stream with channel mapping family 0 or 1 (maximum 8
channels) within limits that should be reasonable for any such stream,
allowing for resource-constrained implementations with fixed buffers
and a reasonably reliable test for ensuring that a particular stream
is likely to play even on resource constrained players as intended,
while still allowing streams that are limited only by what should be
considered malicious which may not be playable on all players.

To address these issues, I propose that the packet size required to be
processed by any implementation be stated as a simple 60 kB size and
be connected to the Section 5.1.1.5 requirement.  This is the size
suggested at the end of Section 6 and simplifies the requirement
without increasing the packet size that could be interpreted to
already be required to be played by a minimal player.  I further
suggest that the larger size at which audio data packets SHOULD be
rejected be set at 60 kB per stream rather than per "channel",
aligning with the largest possible valid packet without padding as
described at the beginning of the second paragraph of Section 6.
Because packets exceeding 60 kB per stream necessarily use padding in
excess of what would be required to make a VBR stream CBR, this also
aligns with Section 6 encoder recommendations and avoids the situation
where decoders SHOULD reject certain packets that an encoder not
violating any SHOULD or MUST might produce.

The new text about decoders displaying a warning message is also
worrying.  Typically a decoder does not have access to the
application's preferred display or localization facilities so this
seems inappropriate.  Even considering the higher layers it seems
unlikely that typical usage, such as background audio decoding in a
portable music player, game engine, or the like would have a facility
for displaying this kind of warning message during decoding.  Also if
a warning should be displayed when packets larger than 60 kB per
channel are rejected, why not when smaller packets are rejected?  For
these reasons, and because applications are already free to indicate
any issues that are important to their users in the manner most
appropriate for their application, I suggest omitting this text.

On another note, it appears that in the packet sizes in Section 6, kB
refers to 1024 bytes and MB refers to 1024*1024 bytes, whereas
everywhere else in the document (including kbps and Mbps in the same
paragraph) the standard SI decimal multiples are intended.  Because
the usage is limited I suggest just sticking with bytes/octets to
avoid confusion, rather than adding a sentence to clarify the mixed
usage.  If a 1024-byte unit is desired I believe that the traditional
designation is KB (or the newer KiB).

Below is a precise proposal for comment:

In Section 5.1.1.5 replace:

   An Ogg Opus player MUST play any Ogg Opus stream with a channel
   mapping family of 0 or 1, even if the number of channels does not
   match the physically connected audio hardware.

with:

   An Ogg Opus player MUST play any valid Ogg Opus stream with a
   channel mapping family of 0 or 1 that contains no packet larger
   than 61,440 octets, even if the number of channels does not match
   the physically connected audio hardware.

In Section 6 replace:

   Technically, valid Opus packets can be arbitrarily large due to the
   padding format, although the amount of non-padding data they can
   contain is bounded.  These packets might be spread over a similarly
   enormous number of Ogg pages.  Encoders SHOULD use no more padding
   than is necessary to make a variable bitrate (VBR) stream constant
   bitrate (CBR).  Decoders SHOULD avoid attempting to allocate
   excessive amounts of memory when presented with a very large packet.
   Decoders SHOULD reject packets larger than 60 kB per channel, and
   display a warning message, and MAY reject packets larger than 7.5 kB
   per channel.  The presence of an extremely large packet in the stream
   could indicate a memory exhaustion attack or stream corruption.

with:

   Technically, valid Opus packets can be arbitrarily large due to
   padding, although the amount of non-padding data that an audio data
   packet can contain is bounded.  These packets might be spread over
   a similarly enormous number of Ogg pages.  In an audio data packet,
   encoders SHOULD use no more padding than is necessary to make a
   variable bitrate (VBR) stream constant bitrate (CBR).  Decoders
   SHOULD reject audio data packets larger than 61,440 octets per
   stream; such packets necessarily contain more padding than needed
   for this purpose.  Decoders SHOULD avoid attempting to allocate
   excessive amounts of memory when presented with a very large
   packet, and MAY reject or partially process any packet larger
   than 61,440 octets.  The presence of an extremely large packet
   in the stream could indicate a memory exhaustion attack or stream
   corruption.

Replace:

   In an Ogg Opus stream, the largest possible valid packet that does
   not use padding has a size of (61,298*N - 2) octets, or about 60 kB
   per Opus stream.  With 255 streams, this is 15,630,988 octets
   (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which
   will have a granule position of -1.  This is of course a very extreme
   packet, consisting of 255 streams, each containing 120 ms of audio
   encoded as 2.5 ms frames, each frame using the maximum possible
   number of octets (1275) and stored in the least efficient manner
   allowed (a VBR code 3 Opus packet).  Even in such a packet, most of
   the data will be zeros as 2.5 ms frames cannot actually use all
   1275 octets.  The largest packet consisting of entirely useful data
   is (15,326*N - 2) octets, or about 15 kB per stream.  This
   corresponds to 120 ms of audio encoded as 10 ms frames in either SILK
   or Hybrid mode, but at a data rate of over 1 Mbps, which makes little
   sense for the quality achieved.  A more reasonable limit is
   (7,664*N - 2) octets, or about 7.5 kB per stream.  This corresponds
   to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with a
   total bitrate just under 511 kbps (not counting the Ogg encapsulation
   overhead).  With N=8, the maximum number of channels currently
   defined by mapping family 1, this gives a maximum packet size of
   61,310 octets, or just under 60 kB.  This is still quite
   conservative, as it assumes each output channel is taken from one
   decoded channel of a stereo packet.  An implementation could
   reasonably choose any of these numbers for its internal limits.

with:

   In an Ogg Opus stream, the largest possible valid audio data packet
   that does not use padding has a size of (61,298*N - 2) octets,
   although a valid comment header packet may be larger.  With 255 Opus
   streams, this is 15,630,988 octets and can span up to 61,298 Ogg
   pages, all but one of which will have a granule position of -1.
   This is of course a very extreme packet, consisting of 255 Opus
   streams, each containing 120 ms of audio encoded as 2.5 ms frames,
   each frame using the maximum possible number of octets (1275) and
   stored in the least efficient manner allowed (a VBR code 3 Opus
   packet).  Even in such a packet, most of the data will be zeros as
   2.5 ms frames cannot actually use all 1275 octets.

   The largest audio data packet consisting of entirely useful data is
   (15,326*N - 2) octets.  This corresponds to 120 ms of audio encoded
   as 10 ms frames in either SILK or Hybrid mode, but at a data rate
   of over 1 Mbps, which makes little sense for the quality achieved.

   A more reasonable limit is (7,664*N - 2) octets.  This corresponds
   to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with
   a total bitrate just under 511 kbps (not counting the Ogg
   encapsulation overhead).  For mapping family 1, N=8 provides a
   reasonable upper bound, as it allows for each of the 8 possible
   output channels to be encoded in a separate stereo Opus stream.
   This gives a packet size of 61,310 octets, which is rounded up to a
   multiple of 1024 octets to give the packet size of 61,440 octets
   that is expected to be successfully processed by any implementation.

I also found that the fragment identifiers are no longer valid in the
informative references to the Vorbis specification.  This fixes those
and makes the host name consistent.

In Section 14.2 replace:

   https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9

with:

   https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-810004.3.9

Replace:

   https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-130000A.2

with:

   https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-132000A.2

 - Mark

_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

[codec] draft-ietf-codec-oggopus-07 packet sizes

Reply via email to