Ben Campbell wrote:
Here is my AD evaluation of draft-ietf-codec-oggopus-09. Codec
encapsulation is not my strong suit, so I suspect many of my comments
can be answered with "this is the way we normally do this thing". But
I'd still like to discuss these before going to IETF last call.
Thank you for the very thorough review. Comments in-line below (as an
individual).
- section 2, last paragraph:
Is this paragraph necessary or useful? (And is it really impossible to
create a non-compliant implementation that meets all the MUSTs?) I
suggest removing it.
This was copied from RFC 5334. I don't think anyone has any particular
attachment to it, so let's just remove it.
- section 3:
This section seems to use 2119 keywords to describe what I _think_ are
existing Ogg requirements. If so, they should use descriptive text,
unless they are in the form of attributed quotes.
I think I can attribute this mostly to paranoia over RFC 3533 being
informational, and using descriptive text of its own for what should be
normative requirements. E.g.,
"Ogg also allows but does not require secondary header packets after the
bos page for logical bitstreams and these must also precede any data
packets in any logical bitstream. These subsequent header packets are
framed into an integral number of pages, which will not contain any data
packets." (from RFC333)
vs.
"The second packet in the logical Ogg bitstream MUST contain the
comment header ... It MAY span multiple pages, beginning on the second
page of the logical stream. However many pages it spans, the comment
header packet MUST finish the page on which it completes." (from this draft)
Unless someone objects, though, I'll try to rewrite the latter without
the 2119 keywords.
- 3, last paragraph:
-- "implementations need to be prepared":
maybe that should be a MUST or SHOULD?
I think the intent was just to point out that someone might violate the
preceding SHOULD. It's not clear what actionable thing we'd be saying
they MUST do (it would depend on the particular implementation).
-- "last page SHOULD NOT be a continued packet":
Why not MUST NOT? Can you describe reasons someone might reasonably
violate this?
The intent was to allow but discourage page-level editing of files
(i.e., cutting and splicing files by copying complete pages without
examining the packets on them). Even just saving a live stream that
drops at some point, you might reasonably expect to just be able to copy
the data you received to a file. A MUST here would mean you would have
to parse the pages, identify the continued packets from the lacing
values, remove the extraneous data from the last page if you found some,
repack the page buffer, and recompute the page checksum. That's a lot
more machinery, and people would likely just ignore the requirement in
practice. See also the discussion at
<https://www.ietf.org/mail-archive/web/codec/current/msg03145.html>.
- 4.2, 2nd paragraph:
-- "samples which SHOULD be skipped": Why not MUST? (also, s/which/that)
Some specific applications might have a reason for looking at that data
(for example, when you have multiple files that you know were produced
by chopping up a single existing file without re-encoding). In practice
we haven't needed anything stronger to convince people to implement this
correctly (especially when we point them to
<https://people.xiph.org/~greg/opus_testvectors/correctness_trimming_nobeeps.opus>).
-- "SHOULD NOT be played": Is this redundant with SHOULD be skipped?
Yes. Deleted.
- 4.3: Should the reference to [vorbis-trim] be normative? Can the
feature be implemented without understanding the referenced doc?
The answer to the second question is "yes", which I think makes the
answer to the first question "no". The reason this is here is to tell
people familiar with how Vorbis works what mechanism they should use to
achieve the same outcome in Opus. But if you're not familiar with
Vorbis, then you can safely ignore the whole paragraph.
- 4.6: Should the reference to [seeking] be normative? Can the section
be implemented without understanding it?
It is possible to implement seeking without knowing anything other than
what is described in RFC 3533, but the linked documentation does help
considerably.
I would argue that the section itself is not normative, in that you
could implement seeking with some other method (for example, linearly
scanning the whole file on open to build an index, possibly caching such
indexes for files that are opened frequently: this would not be my
recommended approach, but people have done it in practice). You could
also choose to implement only approximate seeking by just guessing a
file position and decoding from there (also done in practice by some
players, and a common method for other formats like MP3). Or you could
choose not to implement seeking at all.
- 5.1, list entry 5: What is the unit for "Input Sample Rate"?
Hz. It was specified in Figure 1, but you're right, it should be in the
text, too. Added the sentence, "This is the sample rate of the original
input (before encoding), in Hz."
- 5.1, entry 6, sentence starting with "Virtually all players..."
Is this a normative statement, or a statement of fact? If the former,
consider dropping “Virtually all”
Dropped "virtually all".
- 5.1, entry 6, last paragraph:
-- "which MUST be omitted when the channel mapping family is 0": That
seems redundant to the previous paragraph.
Yes. Deleted the sentence in the previous paragraph.
-- "Implementations SHOULD reject ID headers"
What does it mean, in context, to "reject" headers?
The same thing as rejecting a stream (i.e., don't try to play/decode
it). Changed to say that instead.
- 5.1.1.4: Why not MUST?
I think the idea was to allow assigning reserved values without
explicitly updating this specification. Even though I expect we _would_
want to update this specification, given how long it's taken to get this
draft published, I'm wary of adding more pressure to people not to
deploy new mappings (because this RFC says they MUST not), even when it
looks like they'll be relatively stable and interoperable. At least not
for the sole reason that they haven't gotten an RFC number yet.
- 5.2, last two paragraph: Why are the two SHOULDs not MUSTs?
For the first SHOULD, this is the same as above: some other
specification may eventually define the format of the data here and how
to modify it, and I didn't think we'd want to make people supporting
that specification in violation of this specification. I'm also leaning
towards keeping this one a SHOULD since a) the eventual format of this
data may apply to all codecs which share the vorbiscomment style of
metadata, and b) we would probably try to do this informally first to
gain implementation experience, and only bring it to the IETF if there
is sufficient interest. I didn't want to make updating this document a
prerequisite for that. See also the discussion in this thread:
<https://www.ietf.org/mail-archive/web/codec/current/msg03061.html>
(skip to
<https://www.ietf.org/mail-archive/web/codec/current/msg03107.html> for
the resolution of the debate).
For the second SHOULD, it felt like the right strength for a judgment
call like "excessive". I have no objection to changing it to MUST.
- 5.2.1, 2nd to last paragraph, last sentence:
I don’t think muxers assume things—what is the required concrete behavior?
I replaced this with "A muxer SHOULD place the gain it wants other tools
to use by default into the 'output gain' field, and not the comment tag."
- 5.2.1, last paragraph: Why is the SHOULD NOT not MUST NOT?
In some contexts, there may be no confusion with multiple normalization
schemes. The language in this section was debated a good deal on the
list (see threads
<https://www.ietf.org/mail-archive/web/codec/current/msg02930.html>,
<https://www.ietf.org/mail-archive/web/codec/current/msg03012.html>, and
<https://www.ietf.org/mail-archive/web/codec/current/msg03053.html>). I
believe everyone is satisfied with the current compromise, and would
prefer not to change the strength of any normative statements in it to
avoid re-igniting the debate.
-6, first paragraph:
Why are the two SHOULDs not MUSTs? What does it mean to reject a packet?
For the first SHOULD, the fact that most decoders are likely to reject
such packets is enough to discourage encoders from emitting them, and I
was wary of imposing a MUST restriction when RFC 6716 chose not to do so.
For the second, this is the same as the previous comment about excessive
allocations. I have no objection to changing this one, either.
"Reject a packet" is just shorthand for "treat them as if it was a
malformed Opus packet with an invalid TOC sequence". I've added explicit
text saying the latter.
-7: "reference implementation" could use a citation. Should
Added a reference to RFC 6716.
[linear-predection] be normative? Can this be implemented without
understanding that? If it does need to be normative, can you find a more
stable reference?
Linear prediction is something that would be understood by "a person
having ordinary skill in the art" (to borrow a phrase). We just wanted
to include a reference to help those who might not be DSP engineers but
wanted to learn more.
- 9, first paragraph:
This seems to defer at least some security considerations to 4732. If
so, this should be a normative reference. If, on the other hand, 4732
just provides additional information that is not necessary to understand
this section, then the wording should be adjusted.
Upgraded the reference to normative.
- 13:
Is this needed? Are the IETF BCP 78 and 79 provisions incorrect for this?
This text was specifically requested by the Debian project (and I
support it personally). The same issue came up for RFC 6716 (for the
same reasons). See the discussion at
<https://www.ietf.org/mail-archive/web/codec/current/msg02845.html> and
<https://www.ietf.org/mail-archive/web/ietf/current/msg73116.html>.
Reviewing those conversations, I do notice that the language of that
section was changed after last call for RFC 6716, but the corresponding
language in our source repository was not updated (because it was added
to the document boilerplate manually by the RFC Editor, and not included
in a separate section).
Editorial
=========
- Section 1: Consider spelling out multiplexer (muxer) and demultiplexer
(demuxer) on first mention.
Done.
- 3: Please expand TOC, SILK, and CELT on first mention.
Doing this directly in the current text was a little bit awkward. I
reworded it to
"...An implementation of this specification SHOULD treat any Opus packet
whose duration is different from that of the first Opus packet in an Ogg
packet as if it were a malformed Opus packet with an invalid Table Of
Contents (TOC) sequence.
"The TOC sequence at the beginning of each Opus packet indicates the
coding mode, audio bandwidth, channel count, duration (frame size), and
number of frames per packet, as described in Section 3.1 of [RFC6716].
The coding mode is one of SILK, Hybrid, or Constrained Energy Lapped
Transform (CELT). The combination of coding mode, audio bandwidth, and
frame size is referred to as the configuration of an Opus packet."
I did not expand SILK because it is not an acronym.
- 5, steps 2 and 3:
These steps in combination seem to say “ decode at the highest available
supported rate that the hardware can handle.” If so, the wording could
be simplified.
It is a little bit more complicated. There are two rates here: one used
for decoding, and one used for playback. The difference between steps 2
and 3 is that in step 3, the hardware's highest available sample rate is
not one of the rates supported by Opus decoding. For example, if the
highest available sample rate the hardware can handle is 32 kHz, then
this is not a "supported rate" (i.e., not supported by Opus decoding),
so step 2 does not apply and step 3 does. Step 3 says to decode at the
next highest "supported rate", which would be 48 kHz, and then resample
to a rate the hardware can handle (in this case 32 kHz).
I don't think that's equivalent to your summary of the two steps.
- 7, 2nd paragraph:
I don’t think that’s literally true, you’ve defined 0, 1, and 255. The
rest are reserved. (I guess one could argue the guidance to treat
reserved values as 255 supports this statement, but I think that’s a
stretch.
Yes, this is really true for the currently defined mapping families.
Changed to "Each currently specified value of this octet indicates..."
-5.1.1.2: Do you expect the reader to know what Vorbis channel order is?
Also, please expand FLAC on first mention.
It's a fairly common term among those who work with audio formats, but I
added a "(see below)" for those who don't, since the table below it
lists what that order is.
- 5.2.1, paragraph starting with " An Ogg Opus stream MUST NOT have more
than one of each tag, and if present their values MUST be an integer"
each of these two new tags, or all tags? (I assume the former from the
numeric values) Also, singular/plural mismatch in "their values MUST be
an integer"
"each of these tags" seems to resolve both issues.
-10, last paragraph:
Updates, or extends? If the former, the draft needs an “Updates RFC
5334” field in the initial heading.
Added the field.
A complete diff of the changes since -09 so far is attached (and, as
always, can be found in the Opus git repository:
<https://git.xiph.org/?p=opus.git>). I'll post again when I finish
reviewing the changes for normative language for generic Ogg
requirements and the extra copyright grant.
diff --git a/doc/draft-ietf-codec-oggopus.xml b/doc/draft-ietf-codec-oggopus.xml
index 124d488..b95e5bc 100644
--- a/doc/draft-ietf-codec-oggopus.xml
+++ b/doc/draft-ietf-codec-oggopus.xml
@@ -7,17 +7,18 @@
<!ENTITY rfc5334 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml'>
<!ENTITY rfc6381 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml'>
<!ENTITY rfc6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml'>
<!ENTITY rfc6982 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6982.xml'>
<!ENTITY rfc7587 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.7587.xml'>
]>
<?rfc toc="yes" symrefs="yes" ?>
-<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-09">
+<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-09"
+ updates="5334">
<front>
<title abbrev="Ogg Opus">Ogg Encapsulation for the Opus Audio Codec</title>
<author initials="T.B." surname="Terriberry" fullname="Timothy B. Terriberry">
<organization>Mozilla Corporation</organization>
<address>
<postal>
<street>650 Castro Street</street>
@@ -100,18 +101,18 @@ Each page is associated with a particular logical stream and contains a capture
pattern and checksum, flags to mark the beginning and end of the logical
stream, and a 'granule position' that represents an absolute position in the
stream, to aid seeking.
A single page can contain up to 65,025 octets of packet data from up to 255
different packets.
Packets can be split arbitrarily across pages, and continued from one page to
the next (allowing packets much larger than would fit on a single page).
Each page contains 'lacing values' that indicate how the data is partitioned
- into packets, allowing a demuxer to recover the packet boundaries without
- examining the encoded data.
+ into packets, allowing a demultiplexer (demuxer) to recover the packet
+ boundaries without examining the encoded data.
A packet is said to 'complete' on a page when the page contains the final
lacing value corresponding to that packet.
</t>
<t>
This encapsulation defines the contents of the packet data, including
the necessary headers, the organization of those packets into a logical
stream, and the interpretation of the codec-specific granule position field.
It does not attempt to describe or specify the existing Ogg container format.
@@ -123,24 +124,16 @@ Readers unfamiliar with the basic concepts mentioned above are encouraged to
<section anchor="terminology" title="Terminology">
<t>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in <xref target="RFC2119"/>.
</t>
-<t>
-Implementations that fail to satisfy one or more "MUST" requirements are
- considered non-compliant.
-Implementations that satisfy all "MUST" requirements, but fail to satisfy one
- or more "SHOULD" requirements are said to be "conditionally compliant".
-All other implementations are "unconditionally compliant".
-</t>
-
</section>
<section anchor="packet_organization" title="Packet Organization">
<t>
An Ogg Opus stream is organized as follows.
</t>
<t>
There are two mandatory header packets.
@@ -175,25 +168,28 @@ The first (N - 1) Opus packets, if any, are packed one after another
into the Ogg packet, using the self-delimiting framing from Appendix B of
<xref target="RFC6716"/>.
The remaining Opus packet is packed at the end of the Ogg packet using the
regular, undelimited framing from Section 3 of <xref target="RFC6716"/>.
All of the Opus packets in a single Ogg packet MUST be constrained to have the
same duration.
An implementation of this specification SHOULD treat any Opus packet whose
duration is different from that of the first Opus packet in an Ogg packet as
- if it were a malformed Opus packet with an invalid TOC sequence.
+ if it were a malformed Opus packet with an invalid Table Of Contents (TOC)
+ sequence.
</t>
<t>
-The coding mode (SILK, Hybrid, or CELT), audio bandwidth, channel count,
- duration (frame size), and number of frames per packet, are indicated in the
- TOC (table of contents) sequence at the beginning of each Opus packet, as
- described in Section 3.1 of <xref target="RFC6716"/>.
-The combination of mode, audio bandwidth, and frame size is referred to as
- the configuration of an Opus packet.
+The TOC sequence at the beginning of each Opus packet indicates the coding
+ mode, audio bandwidth, channel count, duration (frame size), and number of
+ frames per packet, as described in Section 3.1
+ of <xref target="RFC6716"/>.
+The coding mode is one of SILK, Hybrid, or Constrained Energy Lapped Transform
+ (CELT).
+The combination of coding mode, audio bandwidth, and frame size is referred to
+ as the configuration of an Opus packet.
</t>
<t>
The first audio data page SHOULD NOT have the 'continued packet' flag set
(which would indicate the first audio data packet is continued from a previous
page).
Packets MUST be placed into Ogg pages in order until the end of stream.
Audio packets MAY span page boundaries.
An implementation MUST treat a zero-octet audio data packet as if it were a
@@ -264,18 +260,19 @@ All other pages with completed packets after the first MUST have a granule
This guarantees that a demuxer can assign individual packets the same granule
position when working forwards as when working backwards.
For this to work, there cannot be any gaps.
</t>
<section anchor="gap-repair" title="Repairing Gaps in Real-time Streams">
<t>
In order to support capturing a real-time stream that has lost or not
- transmitted packets, a muxer SHOULD emit packets that explicitly request the
- use of Packet Loss Concealment (PLC) in place of the missing packets.
+ transmitted packets, a multiplexer (muxer) SHOULD emit packets that explicitly
+ request the use of Packet Loss Concealment (PLC) in place of the missing
+ packets.
Implementations that fail to do so still MUST NOT increment the granule
position for a page by anything other than the number of samples contained in
packets that actually complete on that page.
</t>
<t>
Only gaps that are a multiple of 2.5 ms are repairable, as these are the
only durations that can be created by packet loss or discontinuous
transmission.
@@ -374,21 +371,21 @@ Therefore, the first few samples produced by the decoder do not correspond to
These samples need to be stored and decoded, as Opus is an asymptotically
convergent predictive codec, meaning the decoded contents of each frame depend
on the recent history of decoder inputs.
However, a player will want to skip these samples after decoding them.
</t>
<t>
A 'pre-skip' field in the ID header (see <xref target="id_header"/>) signals
- the number of samples which SHOULD be skipped (decoded but discarded) at the
+ the number of samples that SHOULD be skipped (decoded but discarded) at the
beginning of the stream.
This amount need not be a multiple of 2.5 ms, MAY be smaller than a single
packet, or MAY span the contents of several packets.
-These samples are not valid audio, and SHOULD NOT be played.
+These samples are not valid audio.
</t>
<t>
For example, if the first Opus frame uses the CELT mode, it will always
produce 120 samples of windowed overlap-add data.
However, the overlap data is initially all zeros (since there is no prior
frame), meaning this cannot, in general, accurately represent the original
audio.
@@ -639,16 +636,17 @@ This is the number of samples (at 48 kHz) to discard from the decoder
When cropping the beginning of existing Ogg Opus streams, a pre-skip of at
least 3,840 samples (80 ms) is RECOMMENDED to ensure complete
convergence in the decoder.
<vspace blankLines="1"/>
</t>
<t>Input Sample Rate (32 bits, unsigned, little
endian):
<vspace blankLines="1"/>
+This is the sample rate of the original input (before encoding), in Hz.
This field is <spanx style="emph">not</spanx> the sample rate to use for
playback of the encoded data.
<vspace blankLines="1"/>
Opus can switch between internal audio bandwidths of 4, 6, 8, 12, and
20 kHz.
Each packet in the stream can have a different audio bandwidth.
Regardless of the audio bandwidth, the reference decoder supports decoding any
stream at a sample rate of 8, 12, 16, 24, or 48 kHz.
@@ -696,17 +694,17 @@ It is 20*log10 of the factor by which to scale the decoder output to achieve
To apply the gain, an implementation could use
<figure align="center">
<artwork align="center"><![CDATA[
sample *= pow(10, output_gain/(20.0*256)) ,
]]></artwork>
</figure>
where output_gain is the raw 16-bit value from the header.
<vspace blankLines="1"/>
-Virtually all players and media frameworks SHOULD apply it by default.
+Players and media frameworks SHOULD apply it by default.
If a player chooses to apply any volume adjustment or gain modification, such
as the R128_TRACK_GAIN (see <xref target="comment_header"/>), the adjustment
MUST be applied in addition to this output gain in order to achieve playback
at the normalized volume.
<vspace blankLines="1"/>
A muxer SHOULD set this field to zero, and instead apply any gain prior to
encoding, when this is possible and does not conflict with the user's wishes.
A nonzero output gain indicates the gain was adjusted after encoding, or that
@@ -720,36 +718,34 @@ The large range serves in part to ensure that gain can always be losslessly
transferred between OpusHead and R128 gain tags (see below) without
saturating.
<vspace blankLines="1"/>
</t>
<t>Channel Mapping Family (8 bits, unsigned):
<vspace blankLines="1"/>
This octet indicates the order and semantic meaning of the output channels.
<vspace blankLines="1"/>
-Each possible value of this octet indicates a mapping family, which defines a
- set of allowed channel counts, and the ordered set of channel names for each
- allowed channel count.
+Each currently specified value of this octet indicates a mapping family, which
+ defines a set of allowed channel counts, and the ordered set of channel names
+ for each allowed channel count.
The details are described in <xref target="channel_mapping"/>.
</t>
<t>Channel Mapping Table:
This table defines the mapping from encoded streams to output channels.
-It MUST be omitted when the channel mapping family is 0, but is
- REQUIRED otherwise.
Its contents are specified in <xref target="channel_mapping"/>.
</t>
</list>
</t>
<t>
All fields in the ID headers are REQUIRED, except for the channel mapping
table, which MUST be omitted when the channel mapping family is 0, but
is REQUIRED otherwise.
-Implementations SHOULD reject ID headers which do not contain enough data for
- these fields, even if they contain a valid Magic Signature.
+Implementations SHOULD reject streams with ID headers that do not contain
+ enough data for these fields, even if they contain a valid Magic Signature.
Future versions of this specification, even backwards-compatible versions,
might include additional fields in the ID header.
If an ID header has a compatible major version, but a larger minor version,
an implementation MUST NOT reject it for containing additional data not
specified here, provided it still completes on the first page.
</t>
<section anchor="channel_mapping" title="Channel Mapping">
@@ -869,17 +865,17 @@ Special mapping: This channel mapping value also
When the 'channel mapping family' octet has this value, the channel mapping
table MUST be omitted from the ID header packet.
</t>
</section>
<section anchor="channel_mapping_1" title="Channel Mapping Family 1">
<t>
Allowed numbers of channels: 1...8.
-Vorbis channel order.
+Vorbis channel order (see below).
</t>
<t>
Each channel is assigned to a speaker location in a conventional surround
arrangement.
Specific locations depend on the number of channels, and are given below
in order of the corresponding channel indices.
<list style="symbols">
<t>1 channel: monophonic (mono).</t>
@@ -892,17 +888,17 @@ Specific locations depend on the number of channels, and are given below
<t>8 channels: 7.1 surround (front left, front center, front right, side left, side right, rear left, rear right, LFE)</t>
</list>
</t>
<t>
This set of surround options and speaker location orderings is the same
as those used by the Vorbis codec <xref target="vorbis-mapping"/>.
The ordering is different from the one used by the
WAVE <xref target="wave-multichannel"/> and
- FLAC <xref target="flac"/> formats,
+ Free Lossless Audio Codec (FLAC) <xref target="flac"/> formats,
so correct ordering requires permutation of the output channels when decoding
to or encoding from those formats.
'LFE' here refers to a Low Frequency Effects channel, often mapped to a
subwoofer with no particular spatial position.
Implementations SHOULD identify 'side' or 'rear' speaker locations with
'surround' and 'back' as appropriate when interfacing with audio formats
or systems which prefer that terminology.
</t>
@@ -924,18 +920,18 @@ Implementations SHOULD NOT produce output for channels mapped to stream index
non-silent channels.
</t>
</section>
<section anchor="channel_mapping_undefined"
title="Undefined Channel Mappings">
<t>
The remaining channel mapping families (2...254) are reserved.
-An implementation encountering a reserved channel mapping family value SHOULD
- act as though the value is 255.
+A demuxer implementation encountering a reserved channel mapping family value
+ SHOULD act as though the value is 255.
</t>
</section>
<section anchor="downmix" title="Downmixing">
<t>
An Ogg Opus player MUST support any valid channel mapping with a channel
mapping family of 0 or 1, even if the number of channels does not match the
physically connected audio hardware.
@@ -1188,17 +1184,17 @@ If the least-significant bit of the first byte of this data is 1, then editors
SHOULD preserve the contents of this data when updating the tags, but if this
bit is 0, all such data MAY be treated as padding, and truncated or discarded
as desired.
</t>
<t>
The comment header can be arbitrarily large and might be spread over a large
number of Ogg pages.
-Implementations SHOULD avoid attempting to allocate excessive amounts of memory
+Implementations MUST avoid attempting to allocate excessive amounts of memory
when presented with a very large comment header.
To accomplish this, implementations MAY reject a comment header larger than
125,829,120 octets, and MAY ignore individual comments that are not fully
contained within the first 61,440 octets of the comment header.
</t>
<section anchor="comment_format" title="Tag Definitions">
<t>
@@ -1233,35 +1229,35 @@ R128_ALBUM_GAIN=111
</figure>
<t>
representing the volume shift needed to normalize the overall volume when
played as part of a particular collection of tracks.
The gain is also a Q7.8 fixed point number in dB, as in the ID header's
'output gain' field.
</t>
<t>
-An Ogg Opus stream MUST NOT have more than one of each tag, and if present
- their values MUST be an integer from -32768 to 32767, inclusive,
+An Ogg Opus stream MUST NOT have more than one of each of these tags, and if
+ present their values MUST be an integer from -32768 to 32767, inclusive,
represented in ASCII as a base 10 number with no whitespace.
A leading '+' or '-' character is valid.
Leading zeros are also permitted, but the value MUST be represented by
no more than 6 characters.
Other non-digit characters MUST NOT be present.
</t>
<t>
If present, R128_TRACK_GAIN and R128_ALBUM_GAIN MUST correctly represent
the R128 normalization gain relative to the 'output gain' field specified
in the ID header.
If a player chooses to make use of the R128_TRACK_GAIN tag or the
R128_ALBUM_GAIN tag, it MUST apply those gains
<spanx style="emph">in addition</spanx> to the 'output gain' value.
If a tool modifies the ID header's 'output gain' field, it MUST also update or
remove the R128_TRACK_GAIN and R128_ALBUM_GAIN comment tags if present.
-A muxer SHOULD assume that by default tools will respect the 'output gain'
- field, and not the comment tag.
+A muxer SHOULD place the gain it wants other tools to use by default into the
+ 'output gain' field, and not the comment tag.
</t>
<t>
To avoid confusion with multiple normalization schemes, an Opus comment header
SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK,
REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK tags.
<xref target="EBU-R128"/> normalization is preferred to the earlier
REPLAYGAIN schemes because of its clear definition and adoption by industry.
Peak normalizations are difficult to calculate reliably for lossy codecs
@@ -1277,20 +1273,21 @@ In the authors' investigations they were not applied consistently or broadly
<section anchor="packet_size_limits" title="Packet Size Limits">
<t>
Technically, valid Opus packets can be arbitrarily large due to the padding
format, although the amount of non-padding data they can contain is bounded.
These packets might be spread over a similarly enormous number of Ogg pages.
When encoding, implementations SHOULD limit the use of padding in audio data
packets to no more than is necessary to make a variable bitrate (VBR) stream
constant bitrate (CBR).
-Demuxers SHOULD reject audio data packets larger than 61,440 octets per
+Demuxers SHOULD reject audio data packets (treat them as if they were malformed
+ Opus packets with an invalid TOC sequence) larger than 61,440 octets per
Opus stream.
Such packets necessarily contain more padding than needed for this purpose.
-Demuxers SHOULD avoid attempting to allocate excessive amounts of memory when
+Demuxers MUST avoid attempting to allocate excessive amounts of memory when
presented with a very large packet.
Demuxers MAY reject or partially process audio data packets larger than
61,440 octets in an Ogg Opus stream with channel mapping families 0
or 1.
Demuxers MAY reject or partially process audio data packets in any Ogg Opus
stream if the packet is larger than 61,440 octets and also larger than
7,680 octets per Opus stream.
The presence of an extremely large packet in the stream could indicate a
@@ -1331,18 +1328,19 @@ This gives a size of 61,310 octets, which is rounded up to a multiple of
</section>
<section anchor="encoder" title="Encoder Guidelines">
<t>
When encoding Opus streams, Ogg muxers SHOULD take into account the
algorithmic delay of the Opus encoder.
</t>
<t>
-In encoders derived from the reference implementation, the number of
- samples can be queried with:
+In encoders derived from the reference
+ implementation <xref target="RFC6716"/>, the number of samples can be
+ queried with:
</t>
<figure align="center">
<artwork align="center"><![CDATA[
opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD(&delay_samples));
]]></artwork>
</figure>
<t>
To achieve good quality in the very first samples of a stream, implementations
@@ -1545,16 +1543,17 @@ The authors agree to grant third parties the irrevocable right to copy, use,
</section>
</middle>
<back>
<references title="Normative References">
&rfc2119;
&rfc3533;
&rfc3629;
+ &rfc4732;
&rfc5334;
&rfc6381;
&rfc6716;
<reference anchor="EBU-R128" target="https://tech.ebu.ch/loudness">
<front>
<title>Loudness Recommendation EBU R128</title>
<author>
@@ -1575,17 +1574,16 @@ The authors agree to grant third parties the irrevocable right to copy, use,
</front>
</reference>
</references>
<references title="Informative References">
<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->
- &rfc4732;
&rfc6982;
&rfc7587;
<reference anchor="flac"
target="https://xiph.org/flac/format.html">
<front>
<title>FLAC - Free Lossless Audio Codec Format Description</title>
<author initials="J." surname="Coalson" fullname="Josh Coalson"/>
_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec