Mark Harris wrote:
    An Ogg Opus player MUST play any valid Ogg Opus stream with a
    channel mapping family of 0 or 1 that contains no packet larger
    than 61,440 octets, even if the number of channels does not match
    the physically connected audio hardware.

This should probably forward-reference Section 6, i.e., simply by adding a "(See Section 6)" after "61,440 octets." Also, I assume you meant this restriction to only apply to audio data packets (in keeping with your new text in Section 6)?

In Section 6 replace:

    Technically, valid Opus packets can be arbitrarily large due to the
    padding format, although the amount of non-padding data they can
    contain is bounded.  These packets might be spread over a similarly
    enormous number of Ogg pages.  Encoders SHOULD use no more padding
    than is necessary to make a variable bitrate (VBR) stream constant
    bitrate (CBR).  Decoders SHOULD avoid attempting to allocate
    excessive amounts of memory when presented with a very large packet.
    Decoders SHOULD reject packets larger than 60 kB per channel, and
    display a warning message, and MAY reject packets larger than 7.5 kB
    per channel.  The presence of an extremely large packet in the stream
    could indicate a memory exhaustion attack or stream corruption.

with:

    Technically, valid Opus packets can be arbitrarily large due to
    padding, although the amount of non-padding data that an audio data
    packet can contain is bounded.  These packets might be spread over
    a similarly enormous number of Ogg pages.  In an audio data packet,
    encoders SHOULD use no more padding than is necessary to make a
    variable bitrate (VBR) stream constant bitrate (CBR).  Decoders

Talking about doing something in a single audio data packet that affects an entire stream seems a little tortured. How about, "Encoders SHOULD pad audio data packets no more than necessary to make a variable bitrate (VBR) stream constant bitrate (CBR)"?

    SHOULD reject audio data packets larger than 61,440 octets per
    stream; such packets necessarily contain more padding than needed
    for this purpose.  Decoders SHOULD avoid attempting to allocate
    excessive amounts of memory when presented with a very large
    packet, and MAY reject or partially process any packet larger
    than 61,440 octets.  The presence of an extremely large packet

This limit is reasonable for channel mapping families 0 and 1 (7.5 kB per stream for each of a maximum of 8 non-useless streams), but not for channel mapping family 255, which may carry many more channels (and is currently the only way to do so). Suggest, "...and MAY reject or partially process any packet larger than 61,440 octets when used with channel mapping family 1, or a packet larger than 7,680 octets per stream when used with channel mapping family 255."

We could also put in a smaller MAY limit for channel mapping family 0, but I don't think that's that useful, as I suspect most players will a) only support mapping families 0 and 1, and b) use a fixed-size buffer independent of the file they're decoding.

    in the stream could indicate a memory exhaustion attack or stream
    corruption.

Replace:

    In an Ogg Opus stream, the largest possible valid packet that does
    not use padding has a size of (61,298*N - 2) octets, or about 60 kB
    per Opus stream.  With 255 streams, this is 15,630,988 octets
    (14.9 MB) and can span up to 61,298 Ogg pages, all but one of which
    will have a granule position of -1.  This is of course a very extreme
    packet, consisting of 255 streams, each containing 120 ms of audio
    encoded as 2.5 ms frames, each frame using the maximum possible
    number of octets (1275) and stored in the least efficient manner
    allowed (a VBR code 3 Opus packet).  Even in such a packet, most of
    the data will be zeros as 2.5 ms frames cannot actually use all
    1275 octets.  The largest packet consisting of entirely useful data
    is (15,326*N - 2) octets, or about 15 kB per stream.  This
    corresponds to 120 ms of audio encoded as 10 ms frames in either SILK
    or Hybrid mode, but at a data rate of over 1 Mbps, which makes little
    sense for the quality achieved.  A more reasonable limit is
    (7,664*N - 2) octets, or about 7.5 kB per stream.  This corresponds
    to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with a
    total bitrate just under 511 kbps (not counting the Ogg encapsulation
    overhead).  With N=8, the maximum number of channels currently
    defined by mapping family 1, this gives a maximum packet size of
    61,310 octets, or just under 60 kB.  This is still quite
    conservative, as it assumes each output channel is taken from one
    decoded channel of a stereo packet.  An implementation could
    reasonably choose any of these numbers for its internal limits.

with:

    In an Ogg Opus stream, the largest possible valid audio data packet
    that does not use padding has a size of (61,298*N - 2) octets,
    although a valid comment header packet may be larger.  With 255 Opus
    streams, this is 15,630,988 octets and can span up to 61,298 Ogg
    pages, all but one of which will have a granule position of -1.
    This is of course a very extreme packet, consisting of 255 Opus
    streams, each containing 120 ms of audio encoded as 2.5 ms frames,
    each frame using the maximum possible number of octets (1275) and
    stored in the least efficient manner allowed (a VBR code 3 Opus
    packet).  Even in such a packet, most of the data will be zeros as
    2.5 ms frames cannot actually use all 1275 octets.

    The largest audio data packet consisting of entirely useful data is
    (15,326*N - 2) octets.  This corresponds to 120 ms of audio encoded
    as 10 ms frames in either SILK or Hybrid mode, but at a data rate
    of over 1 Mbps, which makes little sense for the quality achieved.

    A more reasonable limit is (7,664*N - 2) octets.  This corresponds
    to 120 ms of audio encoded as 20 ms stereo CELT mode frames, with
    a total bitrate just under 511 kbps (not counting the Ogg
    encapsulation overhead).  For mapping family 1, N=8 provides a
    reasonable upper bound, as it allows for each of the 8 possible
    output channels to be encoded in a separate stereo Opus stream.

Suggest "decoded from" instead of "encoded in" to make this sound more reasonable.

    This gives a packet size of 61,310 octets, which is rounded up to a
    multiple of 1024 octets to give the packet size of 61,440 octets
    that is expected to be successfully processed by any implementation.

Just wordsmithing, how about: "This gives a limit of 61,310 octets, rounded up to a multiple of 1024 to yield the maximum audio data packet size of 61,440 octets that any implementation is expected to be able to process successfully."

I also found that the fragment identifiers are no longer valid in the
informative references to the Vorbis specification.  This fixes those
and makes the host name consistent.

In Section 14.2 replace:

    https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-800004.3.9

with:

    https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-810004.3.9

Replace:

    https://xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-130000A.2

with:

    https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#x1-132000A.2

Good catch, and thanks for all the review. If what I wrote above sounds fine, and there are no further comments/objections, I will make the changes by the end of the week.

_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

Reply via email to