Re: [codec] AD Evaluation of draft-ietf-codec-oggopus-09

Timothy B. Terriberry Fri, 11 Dec 2015 14:17:43 -0800

Ben Campbell wrote:

Here is my AD evaluation of draft-ietf-codec-oggopus-09. Codec
encapsulation is not my strong suit, so I suspect many of my comments
can be answered with "this is the way we normally do this thing". But
I'd still like to discuss these before going to IETF last call.

Thank you for the very thorough review. Comments in-line below (as anindividual).

- section 2, last paragraph:

Is this paragraph necessary or useful? (And is it really impossible to
create a non-compliant implementation that meets all the MUSTs?) I
suggest removing it.

This was copied from RFC 5334. I don't think anyone has any particularattachment to it, so let's just remove it.

- section 3:
This section seems to use 2119 keywords to describe what I _think_ are
existing Ogg requirements. If so, they should use descriptive text,
unless they are in the form of attributed quotes.

I think I can attribute this mostly to paranoia over RFC 3533 beinginformational, and using descriptive text of its own for what should benormative requirements. E.g.,

"Ogg also allows but does not require secondary header packets after thebos page for logical bitstreams and these must also precede any datapackets in any logical bitstream. These subsequent header packets areframed into an integral number of pages, which will not contain any datapackets." (from RFC333)

vs.

"The second packet in the logical Ogg bitstream MUST contain thecomment header ... It MAY span multiple pages, beginning on the secondpage of the logical stream. However many pages it spans, the commentheader packet MUST finish the page on which it completes." (from this draft)

Unless someone objects, though, I'll try to rewrite the latter withoutthe 2119 keywords.

- 3, last paragraph:
-- "implementations need to be prepared":
  maybe that should be a MUST or SHOULD?

I think the intent was just to point out that someone might violate thepreceding SHOULD. It's not clear what actionable thing we'd be sayingthey MUST do (it would depend on the particular implementation).

-- "last page SHOULD NOT be a continued packet":
Why not MUST NOT? Can you describe reasons someone might reasonably
violate this?

The intent was to allow but discourage page-level editing of files(i.e., cutting and splicing files by copying complete pages withoutexamining the packets on them). Even just saving a live stream thatdrops at some point, you might reasonably expect to just be able to copythe data you received to a file. A MUST here would mean you would haveto parse the pages, identify the continued packets from the lacingvalues, remove the extraneous data from the last page if you found some,repack the page buffer, and recompute the page checksum. That's a lotmore machinery, and people would likely just ignore the requirement inpractice. See also the discussion at<https://www.ietf.org/mail-archive/web/codec/current/msg03145.html>.

- 4.2, 2nd paragraph:
-- "samples which SHOULD be skipped": Why not MUST?  (also, s/which/that)

Some specific applications might have a reason for looking at that data(for example, when you have multiple files that you know were producedby chopping up a single existing file without re-encoding). In practicewe haven't needed anything stronger to convince people to implement thiscorrectly (especially when we point them to<https://people.xiph.org/~greg/opus_testvectors/correctness_trimming_nobeeps.opus>).

-- "SHOULD NOT be played": Is this redundant with SHOULD be skipped?


Yes. Deleted.

- 4.3: Should the reference to [vorbis-trim] be normative? Can the
feature be implemented without understanding the referenced doc?

The answer to the second question is "yes", which I think makes theanswer to the first question "no". The reason this is here is to tellpeople familiar with how Vorbis works what mechanism they should use toachieve the same outcome in Opus. But if you're not familiar withVorbis, then you can safely ignore the whole paragraph.

- 4.6: Should the reference to [seeking] be normative? Can the section
be implemented without understanding it?

It is possible to implement seeking without knowing anything other thanwhat is described in RFC 3533, but the linked documentation does helpconsiderably.

I would argue that the section itself is not normative, in that youcould implement seeking with some other method (for example, linearlyscanning the whole file on open to build an index, possibly caching suchindexes for files that are opened frequently: this would not be myrecommended approach, but people have done it in practice). You couldalso choose to implement only approximate seeking by just guessing afile position and decoding from there (also done in practice by someplayers, and a common method for other formats like MP3). Or you couldchoose not to implement seeking at all.

- 5.1, list entry 5: What is the unit for "Input Sample Rate"?

Hz. It was specified in Figure 1, but you're right, it should be in thetext, too. Added the sentence, "This is the sample rate of the originalinput (before encoding), in Hz."

- 5.1, entry 6, sentence starting with "Virtually all players..."
Is this a normative statement, or a statement of fact? If the former,
consider dropping “Virtually all”


Dropped "virtually all".

- 5.1, entry 6, last paragraph:
-- "which MUST be omitted when the channel mapping family is 0": That
seems redundant to the previous paragraph.


Yes. Deleted the sentence in the previous paragraph.

-- "Implementations SHOULD reject ID headers"
What does it mean, in context, to "reject" headers?

The same thing as rejecting a stream (i.e., don't try to play/decodeit). Changed to say that instead.

- 5.1.1.4: Why not MUST?

I think the idea was to allow assigning reserved values withoutexplicitly updating this specification. Even though I expect we _would_want to update this specification, given how long it's taken to get thisdraft published, I'm wary of adding more pressure to people not todeploy new mappings (because this RFC says they MUST not), even when itlooks like they'll be relatively stable and interoperable. At least notfor the sole reason that they haven't gotten an RFC number yet.

- 5.2, last two paragraph: Why are the two SHOULDs not MUSTs?

For the first SHOULD, this is the same as above: some otherspecification may eventually define the format of the data here and howto modify it, and I didn't think we'd want to make people supportingthat specification in violation of this specification. I'm also leaningtowards keeping this one a SHOULD since a) the eventual format of thisdata may apply to all codecs which share the vorbiscomment style ofmetadata, and b) we would probably try to do this informally first togain implementation experience, and only bring it to the IETF if thereis sufficient interest. I didn't want to make updating this document aprerequisite for that. See also the discussion in this thread:<https://www.ietf.org/mail-archive/web/codec/current/msg03061.html>(skip to<https://www.ietf.org/mail-archive/web/codec/current/msg03107.html> forthe resolution of the debate).

For the second SHOULD, it felt like the right strength for a judgmentcall like "excessive". I have no objection to changing it to MUST.

- 5.2.1, 2nd to last paragraph, last sentence:
I don’t think muxers assume things—what is the required concrete behavior?

I replaced this with "A muxer SHOULD place the gain it wants other toolsto use by default into the 'output gain' field, and not the comment tag."

- 5.2.1, last paragraph: Why is the SHOULD NOT not MUST NOT?

In some contexts, there may be no confusion with multiple normalizationschemes. The language in this section was debated a good deal on thelist (see threads<https://www.ietf.org/mail-archive/web/codec/current/msg02930.html>,<https://www.ietf.org/mail-archive/web/codec/current/msg03012.html>, and<https://www.ietf.org/mail-archive/web/codec/current/msg03053.html>). Ibelieve everyone is satisfied with the current compromise, and wouldprefer not to change the strength of any normative statements in it toavoid re-igniting the debate.

-6, first paragraph:
Why are the two SHOULDs not MUSTs? What does it mean to reject a packet?

For the first SHOULD, the fact that most decoders are likely to rejectsuch packets is enough to discourage encoders from emitting them, and Iwas wary of imposing a MUST restriction when RFC 6716 chose not to do so.

For the second, this is the same as the previous comment about excessiveallocations. I have no objection to changing this one, either.

"Reject a packet" is just shorthand for "treat them as if it was amalformed Opus packet with an invalid TOC sequence". I've added explicittext saying the latter.

-7: "reference implementation" could use a citation. Should


Added a reference to RFC 6716.

[linear-predection] be normative? Can this be implemented without
understanding that? If it does need to be normative, can you find a more
stable reference?

Linear prediction is something that would be understood by "a personhaving ordinary skill in the art" (to borrow a phrase). We just wantedto include a reference to help those who might not be DSP engineers butwanted to learn more.

- 9, first paragraph:
This seems to defer at least some security considerations to 4732. If
so, this should be a normative reference. If, on the other hand, 4732
just provides additional information that is not necessary to understand
this section, then the wording should be adjusted.


Upgraded the reference to normative.

- 13:
Is this needed? Are the IETF BCP 78 and 79 provisions incorrect for this?

This text was specifically requested by the Debian project (and Isupport it personally). The same issue came up for RFC 6716 (for thesame reasons). See the discussion at<https://www.ietf.org/mail-archive/web/codec/current/msg02845.html> and<https://www.ietf.org/mail-archive/web/ietf/current/msg73116.html>.

Reviewing those conversations, I do notice that the language of thatsection was changed after last call for RFC 6716, but the correspondinglanguage in our source repository was not updated (because it was addedto the document boilerplate manually by the RFC Editor, and not includedin a separate section).

Editorial
=========
- Section 1: Consider spelling out multiplexer (muxer) and demultiplexer
(demuxer) on first mention.


Done.

- 3: Please expand TOC, SILK, and CELT on first mention.

Doing this directly in the current text was a little bit awkward. Ireworded it to

"...An implementation of this specification SHOULD treat any Opus packetwhose duration is different from that of the first Opus packet in an Oggpacket as if it were a malformed Opus packet with an invalid Table OfContents (TOC) sequence.

"The TOC sequence at the beginning of each Opus packet indicates thecoding mode, audio bandwidth, channel count, duration (frame size), andnumber of frames per packet, as described in Section 3.1 of [RFC6716].The coding mode is one of SILK, Hybrid, or Constrained Energy LappedTransform (CELT). The combination of coding mode, audio bandwidth, andframe size is referred to as the configuration of an Opus packet."


I did not expand SILK because it is not an acronym.

- 5, steps 2 and 3:
These steps in combination seem to say “ decode at the highest available
supported rate that the hardware can handle.” If so, the wording could
be simplified.

It is a little bit more complicated. There are two rates here: one usedfor decoding, and one used for playback. The difference between steps 2and 3 is that in step 3, the hardware's highest available sample rate isnot one of the rates supported by Opus decoding. For example, if thehighest available sample rate the hardware can handle is 32 kHz, thenthis is not a "supported rate" (i.e., not supported by Opus decoding),so step 2 does not apply and step 3 does. Step 3 says to decode at thenext highest "supported rate", which would be 48 kHz, and then resampleto a rate the hardware can handle (in this case 32 kHz).


I don't think that's equivalent to your summary of the two steps.

- 7, 2nd paragraph:
I don’t think that’s literally true, you’ve defined 0, 1, and 255. The
rest are reserved. (I guess one could argue the guidance to treat
reserved values as 255 supports this statement, but I think that’s a
stretch.

Yes, this is really true for the currently defined mapping families.Changed to "Each currently specified value of this octet indicates..."

-5.1.1.2: Do you expect the reader to know what Vorbis channel order is?
Also, please expand FLAC on first mention.

It's a fairly common term among those who work with audio formats, but Iadded a "(see below)" for those who don't, since the table below itlists what that order is.

- 5.2.1, paragraph starting with " An Ogg Opus stream MUST NOT have more
than one of each tag, and if present their values MUST be an integer"

each of these two new tags, or all tags? (I assume the former from the
numeric values) Also, singular/plural mismatch in "their values MUST be
an integer"


"each of these tags" seems to resolve both issues.

-10, last paragraph:

Updates, or extends? If the former, the draft needs an “Updates RFC
5334” field in the initial heading.


Added the field.

A complete diff of the changes since -09 so far is attached (and, asalways, can be found in the Opus git repository:<https://git.xiph.org/?p=opus.git>). I'll post again when I finishreviewing the changes for normative language for generic Oggrequirements and the extra copyright grant.

diff --git a/doc/draft-ietf-codec-oggopus.xml b/doc/draft-ietf-codec-oggopus.xml
index 124d488..b95e5bc 100644
--- a/doc/draft-ietf-codec-oggopus.xml
+++ b/doc/draft-ietf-codec-oggopus.xml
@@ -7,17 +7,18 @@
 <!ENTITY rfc5334 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5334.xml'>
 <!ENTITY rfc6381 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6381.xml'>
 <!ENTITY rfc6716 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6716.xml'>
 <!ENTITY rfc6982 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.6982.xml'>
 <!ENTITY rfc7587 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.7587.xml'>
 ]>
 <?rfc toc="yes" symrefs="yes" ?>
 
-<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-09">
+<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-oggopus-09"
+ updates="5334">
 
 <front>
 <title abbrev="Ogg Opus">Ogg Encapsulation for the Opus Audio Codec</title>
 <author initials="T.B." surname="Terriberry" fullname="Timothy B. Terriberry">
 <organization>Mozilla Corporation</organization>
 <address>
 <postal>
 <street>650 Castro Street</street>
@@ -100,18 +101,18 @@ Each page is associated with a particular logical stream and contains a capture
  pattern and checksum, flags to mark the beginning and end of the logical
  stream, and a 'granule position' that represents an absolute position in the
  stream, to aid seeking.
 A single page can contain up to 65,025 octets of packet data from up to 255
  different packets.
 Packets can be split arbitrarily across pages, and continued from one page to
  the next (allowing packets much larger than would fit on a single page).
 Each page contains 'lacing values' that indicate how the data is partitioned
- into packets, allowing a demuxer to recover the packet boundaries without
- examining the encoded data.
+ into packets, allowing a demultiplexer (demuxer) to recover the packet
+ boundaries without examining the encoded data.
 A packet is said to 'complete' on a page when the page contains the final
  lacing value corresponding to that packet.
 </t>
 <t>
 This encapsulation defines the contents of the packet data, including
  the necessary headers, the organization of those packets into a logical
  stream, and the interpretation of the codec-specific granule position field.
 It does not attempt to describe or specify the existing Ogg container format.
@@ -123,24 +124,16 @@ Readers unfamiliar with the basic concepts mentioned above are encouraged to
 
 <section anchor="terminology" title="Terminology">
 <t>
 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
  "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this
  document are to be interpreted as described in <xref target="RFC2119"/>.
 </t>
 
-<t>
-Implementations that fail to satisfy one or more "MUST" requirements are
- considered non-compliant.
-Implementations that satisfy all "MUST" requirements, but fail to satisfy one
- or more "SHOULD" requirements are said to be "conditionally compliant".
-All other implementations are "unconditionally compliant".
-</t>
-
 </section>
 
 <section anchor="packet_organization" title="Packet Organization">
 <t>
 An Ogg Opus stream is organized as follows.
 </t>
 <t>
 There are two mandatory header packets.
@@ -175,25 +168,28 @@ The first (N&nbsp;-&nbsp;1) Opus packets, if any, are packed one after another
  into the Ogg packet, using the self-delimiting framing from Appendix&nbsp;B of
  <xref target="RFC6716"/>.
 The remaining Opus packet is packed at the end of the Ogg packet using the
  regular, undelimited framing from Section&nbsp;3 of <xref target="RFC6716"/>.
 All of the Opus packets in a single Ogg packet MUST be constrained to have the
  same duration.
 An implementation of this specification SHOULD treat any Opus packet whose
  duration is different from that of the first Opus packet in an Ogg packet as
- if it were a malformed Opus packet with an invalid TOC sequence.
+ if it were a malformed Opus packet with an invalid Table Of Contents (TOC)
+ sequence.
 </t>
 <t>
-The coding mode (SILK, Hybrid, or CELT), audio bandwidth, channel count,
- duration (frame size), and number of frames per packet, are indicated in the
- TOC (table of contents) sequence at the beginning of each Opus packet, as
- described in Section&nbsp;3.1 of&nbsp;<xref target="RFC6716"/>.
-The combination of mode, audio bandwidth, and frame size is referred to as
- the configuration of an Opus packet.
+The TOC sequence at the beginning of each Opus packet indicates the coding
+ mode, audio bandwidth, channel count, duration (frame size), and number of
+ frames per packet, as described in Section&nbsp;3.1
+ of&nbsp;<xref target="RFC6716"/>.
+The coding mode is one of SILK, Hybrid, or Constrained Energy Lapped Transform
+ (CELT).
+The combination of coding mode, audio bandwidth, and frame size is referred to
+ as the configuration of an Opus packet.
 </t>
 <t>
 The first audio data page SHOULD NOT have the 'continued packet' flag set
  (which would indicate the first audio data packet is continued from a previous
  page).
 Packets MUST be placed into Ogg pages in order until the end of stream.
 Audio packets MAY span page boundaries.
 An implementation MUST treat a zero-octet audio data packet as if it were a
@@ -264,18 +260,19 @@ All other pages with completed packets after the first MUST have a granule
 This guarantees that a demuxer can assign individual packets the same granule
  position when working forwards as when working backwards.
 For this to work, there cannot be any gaps.
 </t>
 
 <section anchor="gap-repair" title="Repairing Gaps in Real-time Streams">
 <t>
 In order to support capturing a real-time stream that has lost or not
- transmitted packets, a muxer SHOULD emit packets that explicitly request the
- use of Packet Loss Concealment (PLC) in place of the missing packets.
+ transmitted packets, a multiplexer (muxer) SHOULD emit packets that explicitly
+ request the use of Packet Loss Concealment (PLC) in place of the missing
+ packets.
 Implementations that fail to do so still MUST NOT increment the granule
  position for a page by anything other than the number of samples contained in
  packets that actually complete on that page.
 </t>
 <t>
 Only gaps that are a multiple of 2.5&nbsp;ms are repairable, as these are the
  only durations that can be created by packet loss or discontinuous
  transmission.
@@ -374,21 +371,21 @@ Therefore, the first few samples produced by the decoder do not correspond to
 These samples need to be stored and decoded, as Opus is an asymptotically
  convergent predictive codec, meaning the decoded contents of each frame depend
  on the recent history of decoder inputs.
 However, a player will want to skip these samples after decoding them.
 </t>
 
 <t>
 A 'pre-skip' field in the ID header (see <xref target="id_header"/>) signals
- the number of samples which SHOULD be skipped (decoded but discarded) at the
+ the number of samples that SHOULD be skipped (decoded but discarded) at the
  beginning of the stream.
 This amount need not be a multiple of 2.5&nbsp;ms, MAY be smaller than a single
  packet, or MAY span the contents of several packets.
-These samples are not valid audio, and SHOULD NOT be played.
+These samples are not valid audio.
 </t>
 
 <t>
 For example, if the first Opus frame uses the CELT mode, it will always
  produce 120 samples of windowed overlap-add data.
 However, the overlap data is initially all zeros (since there is no prior
  frame), meaning this cannot, in general, accurately represent the original
  audio.
@@ -639,16 +636,17 @@ This is the number of samples (at 48&nbsp;kHz) to discard from the decoder
 When cropping the beginning of existing Ogg Opus streams, a pre-skip of at
  least 3,840&nbsp;samples (80&nbsp;ms) is RECOMMENDED to ensure complete
  convergence in the decoder.
 <vspace blankLines="1"/>
 </t>
 <t>Input Sample Rate (32 bits, unsigned, little
  endian):
 <vspace blankLines="1"/>
+This is the sample rate of the original input (before encoding), in Hz.
 This field is <spanx style="emph">not</spanx> the sample rate to use for
  playback of the encoded data.
 <vspace blankLines="1"/>
 Opus can switch between internal audio bandwidths of 4, 6, 8, 12, and
  20&nbsp;kHz.
 Each packet in the stream can have a different audio bandwidth.
 Regardless of the audio bandwidth, the reference decoder supports decoding any
  stream at a sample rate of 8, 12, 16, 24, or 48&nbsp;kHz.
@@ -696,17 +694,17 @@ It is 20*log10 of the factor by which to scale the decoder output to achieve
 To apply the gain, an implementation could use
 <figure align="center">
 <artwork align="center"><![CDATA[
 sample *= pow(10, output_gain/(20.0*256)) ,
 ]]></artwork>
 </figure>
  where output_gain is the raw 16-bit value from the header.
 <vspace blankLines="1"/>
-Virtually all players and media frameworks SHOULD apply it by default.
+Players and media frameworks SHOULD apply it by default.
 If a player chooses to apply any volume adjustment or gain modification, such
  as the R128_TRACK_GAIN (see <xref target="comment_header"/>), the adjustment
  MUST be applied in addition to this output gain in order to achieve playback
  at the normalized volume.
 <vspace blankLines="1"/>
 A muxer SHOULD set this field to zero, and instead apply any gain prior to
  encoding, when this is possible and does not conflict with the user's wishes.
 A nonzero output gain indicates the gain was adjusted after encoding, or that
@@ -720,36 +718,34 @@ The large range serves in part to ensure that gain can always be losslessly
  transferred between OpusHead and R128 gain tags (see below) without
  saturating.
 <vspace blankLines="1"/>
 </t>
 <t>Channel Mapping Family (8 bits, unsigned):
 <vspace blankLines="1"/>
 This octet indicates the order and semantic meaning of the output channels.
 <vspace blankLines="1"/>
-Each possible value of this octet indicates a mapping family, which defines a
- set of allowed channel counts, and the ordered set of channel names for each
- allowed channel count.
+Each currently specified value of this octet indicates a mapping family, which
+ defines a set of allowed channel counts, and the ordered set of channel names
+ for each allowed channel count.
 The details are described in <xref target="channel_mapping"/>.
 </t>
 <t>Channel Mapping Table:
 This table defines the mapping from encoded streams to output channels.
-It MUST be omitted when the channel mapping family is 0, but is
- REQUIRED otherwise.
 Its contents are specified in <xref target="channel_mapping"/>.
 </t>
 </list>
 </t>
 
 <t>
 All fields in the ID headers are REQUIRED, except for the channel mapping
  table, which MUST be omitted when the channel mapping family is 0, but
  is REQUIRED otherwise.
-Implementations SHOULD reject ID headers which do not contain enough data for
- these fields, even if they contain a valid Magic Signature.
+Implementations SHOULD reject streams with ID headers that do not contain
+ enough data for these fields, even if they contain a valid Magic Signature.
 Future versions of this specification, even backwards-compatible versions,
  might include additional fields in the ID header.
 If an ID header has a compatible major version, but a larger minor version,
  an implementation MUST NOT reject it for containing additional data not
  specified here, provided it still completes on the first page.
 </t>
 
 <section anchor="channel_mapping" title="Channel Mapping">
@@ -869,17 +865,17 @@ Special mapping: This channel mapping value also
 When the 'channel mapping family' octet has this value, the channel mapping
  table MUST be omitted from the ID header packet.
 </t>
 </section>
 
 <section anchor="channel_mapping_1" title="Channel Mapping Family 1">
 <t>
 Allowed numbers of channels: 1...8.
-Vorbis channel order.
+Vorbis channel order (see below).
 </t>
 <t>
 Each channel is assigned to a speaker location in a conventional surround
  arrangement.
 Specific locations depend on the number of channels, and are given below
  in order of the corresponding channel indices.
 <list style="symbols">
   <t>1 channel: monophonic (mono).</t>
@@ -892,17 +888,17 @@ Specific locations depend on the number of channels, and are given below
   <t>8 channels: 7.1 surround (front&nbsp;left, front&nbsp;center, front&nbsp;right, side&nbsp;left, side&nbsp;right, rear&nbsp;left, rear&nbsp;right, LFE)</t>
 </list>
 </t>
 <t>
 This set of surround options and speaker location orderings is the same
  as those used by the Vorbis codec <xref target="vorbis-mapping"/>.
 The ordering is different from the one used by the
  WAVE <xref target="wave-multichannel"/> and
- FLAC <xref target="flac"/> formats,
+ Free Lossless Audio Codec (FLAC) <xref target="flac"/> formats,
  so correct ordering requires permutation of the output channels when decoding
  to or encoding from those formats.
 'LFE' here refers to a Low Frequency Effects channel, often mapped to a
   subwoofer with no particular spatial position.
 Implementations SHOULD identify 'side' or 'rear' speaker locations with
  'surround' and 'back' as appropriate when interfacing with audio formats
  or systems which prefer that terminology.
 </t>
@@ -924,18 +920,18 @@ Implementations SHOULD NOT produce output for channels mapped to stream index
  non-silent channels.
 </t>
 </section>
 
 <section anchor="channel_mapping_undefined"
  title="Undefined Channel Mappings">
 <t>
 The remaining channel mapping families (2...254) are reserved.
-An implementation encountering a reserved channel mapping family value SHOULD
- act as though the value is 255.
+A demuxer implementation encountering a reserved channel mapping family value
+ SHOULD act as though the value is 255.
 </t>
 </section>
 
 <section anchor="downmix" title="Downmixing">
 <t>
 An Ogg Opus player MUST support any valid channel mapping with a channel
  mapping family of 0 or 1, even if the number of channels does not match the
  physically connected audio hardware.
@@ -1188,17 +1184,17 @@ If the least-significant bit of the first byte of this data is 1, then editors
  SHOULD preserve the contents of this data when updating the tags, but if this
  bit is 0, all such data MAY be treated as padding, and truncated or discarded
  as desired.
 </t>
 
 <t>
 The comment header can be arbitrarily large and might be spread over a large
  number of Ogg pages.
-Implementations SHOULD avoid attempting to allocate excessive amounts of memory
+Implementations MUST avoid attempting to allocate excessive amounts of memory
  when presented with a very large comment header.
 To accomplish this, implementations MAY reject a comment header larger than
  125,829,120&nbsp;octets, and MAY ignore individual comments that are not fully
  contained within the first 61,440 octets of the comment header.
 </t>
 
 <section anchor="comment_format" title="Tag Definitions">
 <t>
@@ -1233,35 +1229,35 @@ R128_ALBUM_GAIN=111
 </figure>
 <t>
  representing the volume shift needed to normalize the overall volume when
  played as part of a particular collection of tracks.
 The gain is also a Q7.8 fixed point number in dB, as in the ID header's
  'output gain' field.
 </t>
 <t>
-An Ogg Opus stream MUST NOT have more than one of each tag, and if present
- their values MUST be an integer from -32768 to 32767, inclusive,
+An Ogg Opus stream MUST NOT have more than one of each of these tags, and if
+ present their values MUST be an integer from -32768 to 32767, inclusive,
  represented in ASCII as a base 10 number with no whitespace.
 A leading '+' or '-' character is valid.
 Leading zeros are also permitted, but the value MUST be represented by
  no more than 6 characters.
 Other non-digit characters MUST NOT be present.
 </t>
 <t>
 If present, R128_TRACK_GAIN and R128_ALBUM_GAIN MUST correctly represent
  the R128 normalization gain relative to the 'output gain' field specified
  in the ID header.
 If a player chooses to make use of the R128_TRACK_GAIN tag or the
  R128_ALBUM_GAIN tag, it MUST apply those gains
  <spanx style="emph">in addition</spanx> to the 'output gain' value.
 If a tool modifies the ID header's 'output gain' field, it MUST also update or
  remove the R128_TRACK_GAIN and R128_ALBUM_GAIN comment tags if present.
-A muxer SHOULD assume that by default tools will respect the 'output gain'
- field, and not the comment tag.
+A muxer SHOULD place the gain it wants other tools to use by default into the
+ 'output gain' field, and not the comment tag.
 </t>
 <t>
 To avoid confusion with multiple normalization schemes, an Opus comment header
  SHOULD NOT contain any of the REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_TRACK_PEAK,
  REPLAYGAIN_ALBUM_GAIN, or REPLAYGAIN_ALBUM_PEAK tags.
 <xref target="EBU-R128"/> normalization is preferred to the earlier
  REPLAYGAIN schemes because of its clear definition and adoption by industry.
 Peak normalizations are difficult to calculate reliably for lossy codecs
@@ -1277,20 +1273,21 @@ In the authors' investigations they were not applied consistently or broadly
 <section anchor="packet_size_limits" title="Packet Size Limits">
 <t>
 Technically, valid Opus packets can be arbitrarily large due to the padding
  format, although the amount of non-padding data they can contain is bounded.
 These packets might be spread over a similarly enormous number of Ogg pages.
 When encoding, implementations SHOULD limit the use of padding in audio data
  packets to no more than is necessary to make a variable bitrate (VBR) stream
  constant bitrate (CBR).
-Demuxers SHOULD reject audio data packets larger than 61,440 octets per
+Demuxers SHOULD reject audio data packets (treat them as if they were malformed
+ Opus packets with an invalid TOC sequence) larger than 61,440 octets per
  Opus stream.
 Such packets necessarily contain more padding than needed for this purpose.
-Demuxers SHOULD avoid attempting to allocate excessive amounts of memory when
+Demuxers MUST avoid attempting to allocate excessive amounts of memory when
  presented with a very large packet.
 Demuxers MAY reject or partially process audio data packets larger than
  61,440&nbsp;octets in an Ogg Opus stream with channel mapping families&nbsp;0
  or&nbsp;1.
 Demuxers MAY reject or partially process audio data packets in any Ogg Opus
  stream if the packet is larger than 61,440&nbsp;octets and also larger than
  7,680&nbsp;octets per Opus stream.
 The presence of an extremely large packet in the stream could indicate a
@@ -1331,18 +1328,19 @@ This gives a size of 61,310&nbsp;octets, which is rounded up to a multiple of
 </section>
 
 <section anchor="encoder" title="Encoder Guidelines">
 <t>
 When encoding Opus streams, Ogg muxers SHOULD take into account the
  algorithmic delay of the Opus encoder.
 </t>
 <t>
-In encoders derived from the reference implementation, the number of
- samples can be queried with:
+In encoders derived from the reference
+ implementation&nbsp;<xref target="RFC6716"/>, the number of samples can be
+ queried with:
 </t>
 <figure align="center">
 <artwork align="center"><![CDATA[
  opus_encoder_ctl(encoder_state, OPUS_GET_LOOKAHEAD(&delay_samples));
 ]]></artwork>
 </figure>
 <t>
 To achieve good quality in the very first samples of a stream, implementations
@@ -1545,16 +1543,17 @@ The authors agree to grant third parties the irrevocable right to copy, use,
 </section>
 
 </middle>
 <back>
 <references title="Normative References">
  &rfc2119;
  &rfc3533;
  &rfc3629;
+ &rfc4732;
  &rfc5334;
  &rfc6381;
  &rfc6716;
 
 <reference anchor="EBU-R128" target="https://tech.ebu.ch/loudness";>
 <front>
   <title>Loudness Recommendation EBU R128</title>
   <author>
@@ -1575,17 +1574,16 @@ The authors agree to grant third parties the irrevocable right to copy, use,
 </front>
 </reference>
 
 </references>
 
 <references title="Informative References">
 
 <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3550.xml"?-->
- &rfc4732;
  &rfc6982;
  &rfc7587;
 
 <reference anchor="flac"
  target="https://xiph.org/flac/format.html";>
   <front>
     <title>FLAC - Free Lossless Audio Codec Format Description</title>
     <author initials="J." surname="Coalson" fullname="Josh Coalson"/>

_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

Re: [codec] AD Evaluation of draft-ietf-codec-oggopus-09

Reply via email to