Thanks Jean-Baptise, Tim, and Jean-Marc for your questions and
comments. Also thanks for all the support! I've replied inline.

On Fri, May 27, 2016 at 2:02 AM, Jean-Baptiste Kempf <[email protected]> wrote:
> How is this mapping related to the mp4 one from google spatial-media:
> https://github.com/google/spatial-media/blob/master/docs/spatial-audio-rfc.md
The RFC that you linked to described a metadata box for spatial audio
(SA3D). This metadata is specific to mp4. We do not currently plan to
encapsulate Opus in mp4, but may be interested in the future if the
encapsulation described here becomes more widely supported:
wiki.xiph.org/Mp4Opus.

Since we have no plans to encapsulate Opus in mp4, I have been
operating under the assumption that all ambisonic metadata should be
contained in the Ogg stream itself (in this case implicitly by
defining only one channel ordering and normalization).



On Fri, May 27, 2016 at 10:45 AM, Timothy B. Terriberry
<[email protected]> wrote:
> Ogg is a general purpose container, supporting audio, video, subtitles, etc. 
> (though its most common use in current applications is for audio).

I reworded this to say
"Ogg is a general purpose container, supporting audio, video, and other media.
It can be used to encapsulate audio streams coded using the Opus codec"

> I think it may be useful to say that this channel mapping number will also 
> likely impact other formats which use the same channel mapping families, 
> e.g., Maktroska, MP4, MPEG TS. Just adding a sentence to the introduction 
> along the lines of, "This mapping can also be used in other contexts which 
> make use of the channel mappings defined by the Opus Channel Mapping Families 
> registry."

Great, added.
BTW, when either Maktroska or WebM is used to encapsulate opus, the
entire Ogg container is wrapped, including the OpusHead header.

> For n = 0, (1 + n)^2 == 1, but that's not on your explicit list. Was the 
> intention here ...

I would like to include mono to minimize the number of special cases
that decoders have to deal with. For example, if a virtual reality
playback environment expects to always receive ambisonics with mapping
family 2, the encoder could simply send single channel, n=0 ambisonics
when spatial audio is unavailable.

I corrected the explicit list: "Explicitly 1, 4...".

> I've read the [ambix] reference before, and it still took me a while to 
> figure out what this normalization coefficient is actually referring to. ... 
> The coefficient in general seems highly dependent on how you express the 
> spherical harmonic basis functions...

Yes, unfortunately the interpretation of "normalization" depends on
how you define the spherical harmonic functions. I expect that Opus
encoders and decoders would not need to apply this normalization, but
the normalization is part of the semantic meaning of the audio stream.
For example, if the decoder is passing it's decoded stream along to an
ambisonic player which expects PCM with SN3D normalization, then the
decoder does not need to deal with the exact definition of "SN3D". On
the other hand, if the ambisonic player does not support SN3D
playback, an intermediate scaling step would be necessary. I do not
want to require the Opus decoder to perform this scaling, but I want
the Ogg stream to carry enough metadata (implicitly in this case) to
make it clear when and what scaling must be done.

> It may be better to simply refer to [ambix] Section 2.1, equation (1) for the 
> definition of the spherical harmonics basis functions (the explicit 
> section/equation makes it easy to find), and leave out this equation entirely.

I agree, that would be more clear.

> Conversely, it would be reasonable to spell out the full equation for those 
> basis functions in this document (to make it more self-contained). But this 
> kind of sits in-between, where it looks like it's trying to say something 
> that you can understand on its own, but in fact it's impossible to interpret 
> outside of the context of that reference (which already says it).

I agree. I do not think it would be worthwhile to include a
self-contained description of ambisonics, because the interpretation
of the encoded data is fairly complicated and depends on the playback
environment.  I removed this equation and referenced the section in
[ambix] instead.

> I doubt this will really cause confusion, but you don't define W and Y. You 
> might want to say they correspond to the first two ambisonics channels, 
> explicitly.

I added a definition in the paragraph above.

> Does this actually update RFC7845?...

Great, I removed the "updates" attribute from the rfc tag.



On Fri, May 27, 2016 at 11:00 AM, Jean-Marc Valin <[email protected]> wrote:
> I personally don't think 1-channel ambisonics should be allowed
> (unless there's a good reason I missed). We might want to say that
> encoders MUST NOT create them, but decoders SHOULD handle them as if
> they were regular mono files.

As I described above, I believe there is a good reason to support
1-channel ambisonics. The playback of ambisonics from an Ogg stream
can be simplified if the playback system always receives ambisonics
with channel mapping family 2. With 1-channel ambisonics, this
simplification exists even when the streamed data originally had no
spatial content.

> I agree, I'm pretty sure we don't want the "Updates: 7845" line.
Great, removed.


Question for anyone: Do I submit an -01 draft with the changes
mentioned above, or do I send out an intermediate draft on this list?

-- 

Thanks,
Michael Graczyk

_______________________________________________
codec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/codec

Reply via email to