Mark Harris wrote:
An Ogg Opus player MUST play any valid Ogg Opus stream with a
channel mapping family of 0 or 1 that contains no packet larger
than 61,440 octets, even if the number of channels does not match
the physically connected audio hardware.
This should probably forward-reference Section 6, i.e., simply by adding
a "(See Section 6)" after "61,440 octets." Also, I assume you meant this
restriction to only apply to audio data packets (in keeping with your
new text in Section 6)?
In Section 6 replace:
Technically, valid Opus packets can be arbitrarily large due to the
padding format, although the amount of non-padding data they can
contain is bounded. These packets might be spread over a similarly
enormous number of Ogg pages. Encoders SHOULD use no more padding
than is necessary to make a variable bitrate (VBR) stream constant
bitrate (CBR). Decoders SHOULD avoid attempting to allocate
excessive amounts of memory when presented with a very large packet.
Decoders SHOULD reject packets larger than 60 kB per channel, and
display a warning message, and MAY reject packets larger than 7.5 kB
per channel. The presence of an extremely large packet in the stream
could indicate a memory exhaustion attack or stream corruption.
with:
Technically, valid Opus packets can be arbitrarily large due to
padding, although the amount of non-padding data that an audio data
packet can contain is bounded. These packets might be spread over
a similarly enormous number of Ogg pages. In an audio data packet,
encoders SHOULD use no more padding than is necessary to make a
variable bitrate (VBR) stream constant bitrate (CBR). Decoders
Talking about doing something in a single audio data packet that affects
an entire stream seems a little tortured. How about, "Encoders SHOULD
pad audio data packets no more than necessary to make a variable bitrate
(VBR) stream constant bitrate (CBR)"?
SHOULD reject audio data packets larger than 61,440 octets per
stream; such packets necessarily contain more padding than needed
for this purpose. Decoders SHOULD avoid attempting to allocate
excessive amounts of memory when presented with a very large
packet, and MAY reject or partially process any packet larger
than 61,440 octets. The presence of an extremely large packet
This limit is reasonable for channel mapping families 0 and 1 (7.5 kB
per stream for each of a maximum of 8 non-useless streams), but not for
channel mapping family 255, which may carry many more channels (and is
currently the only way to do so). Suggest, "...and MAY reject or
partially process any packet larger than 61,440 octets when used with
channel mapping family 1, or a packet larger than 7,680 octets per
stream when used with channel mapping family 255."
We could also put in a smaller MAY limit for channel mapping family 0,
but I don't think that's that useful, as I suspect most players will a)
only support mapping families 0 and 1, and b) use a fixed-size buffer
independent of the file they're decoding.
in the stream could indicate a memory exhaustion attack or stream
corruption.
Replace:
In an Ogg Opus stream, the largest possible valid packet that does
not use padding has a size of (61,298*N - 2) octets, or about 60 kB
per Opus stream. With 255 streams, this is 15,630,988 octets
(14.9 MB) and can span up to 61,298 Ogg pages, all but one of which
will have a granule position of -1. This is of course a very extreme
packet, consisting of 255 streams, each containing 120 ms of audio
encoded as 2.5 ms frames, each frame using the maximum possible
number of octets (1275) and stored in the least efficient manner
allowed (a VBR code 3 Opus packet). Even in such a packet, most of
the data will be zeros as 2.5 ms frames cannot actually use all
1275 octets. The largest packet consisting of entirely useful data
is (15,326*N - 2) octets, or about 15 kB per stream. This
corresponds to 120 ms of audio encoded as 10 ms frames in either SILK
or Hybrid mode, but at a data rate of over 1 Mbps, which makes little
sense for the quality achieved. A more reasonable limit is
(7,664*N - 2) octets, or about 7.5 kB per stream. This corresponds
to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with a
total bitrate just under 511 kbps (not counting the Ogg encapsulation
overhead). With N=8, the maximum number of channels currently
defined by mapping family 1, this gives a maximum packet size of
61,310 octets, or just under 60 kB. This is still quite
conservative, as it assumes each output channel is taken from one
decoded channel of a stereo packet. An implementation could
reasonably choose any of these numbers for its internal limits.
with:
In an Ogg Opus stream, the largest possible valid audio data packet
that does not use padding has a size of (61,298*N - 2) octets,
although a valid comment header packet may be larger. With 255 Opus
streams, this is 15,630,988 octets and can span up to 61,298 Ogg
pages, all but one of which will have a granule position of -1.
This is of course a very extreme packet, consisting of 255 Opus
streams, each containing 120 ms of audio encoded as 2.5 ms frames,
each frame using the maximum possible number of octets (1275) and
stored in the least efficient manner allowed (a VBR code 3 Opus
packet). Even in such a packet, most of the data will be zeros as
2.5 ms frames cannot actually use all 1275 octets.
The largest audio data packet consisting of entirely useful data is
(15,326*N - 2) octets. This corresponds to 120 ms of audio encoded
as 10 ms frames in either SILK or Hybrid mode, but at a data rate
of over 1 Mbps, which makes little sense for the quality achieved.
A more reasonable limit is (7,664*N - 2) octets. This corresponds
to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with
a total bitrate just under 511 kbps (not counting the Ogg
encapsulation overhead). For mapping family 1, N=8 provides a
reasonable upper bound, as it allows for each of the 8 possible
output channels to be encoded in a separate stereo Opus stream.
Suggest "decoded from" instead of "encoded in" to make this sound more
reasonable.
This gives a packet size of 61,310 octets, which is rounded up to a
multiple of 1024 octets to give the packet size of 61,440 octets
that is expected to be successfully processed by any implementation.
Just wordsmithing, how about: "This gives a limit of 61,310 octets,
rounded up to a multiple of 1024 to yield the maximum audio data packet
size of 61,440 octets that any implementation is expected to be able to
process successfully."
I also found that the fragment identifiers are no longer valid in the
informative references to the Vorbis specification. This fixes those
and makes the host name consistent.
In Section 14.2 replace:
https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9
with:
https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-810004.3.9
Replace:
https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-130000A.2
with:
https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-132000A.2
Good catch, and thanks for all the review. If what I wrote above sounds
fine, and there are no further comments/objections, I will make the
changes by the end of the week.
_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec