Folks, We have a new revision of this draft (attached). Unfortunately it wasn't ready in time for the submission cut-off prior to the IETF meeting next week but we still want to focus the discussion at the session, and afterwards, based on this new revision. We realize that due to the short notice not everyone will have a chance to read it before the session and apologize for the inconvenience.
Cheers, Jan
codec J. Skoglund
Internet-Draft M. Graczyk
Intended status: Standards Track Google Inc.
Expires: September 18, 2017 March 17, 2017
Ambisonics in an Ogg Opus Container
draft-ietf-codec-ambisonics-02
Abstract
This document defines an extension to the Ogg format to encapsulate
ambisonics coded using the Opus audio codec.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 18, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Skoglund & Graczyk Expires September 18, 2017 [Page 1]
Internet-Draft Opus Ambisonics March 2017
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Ambisonics With Ogg Opus . . . . . . . . . . . . . . . . . . 3
3.1. Channel Mapping Family 2 . . . . . . . . . . . . . . . . 3
3.2. Channel Mapping Family 3 . . . . . . . . . . . . . . . . 4
4. Downmixing . . . . . . . . . . . . . . . . . . . . . . . . . 5
5. Security Considerations . . . . . . . . . . . . . . . . . . . 6
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 7
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7
8.1. Normative References . . . . . . . . . . . . . . . . . . 7
8.2. Informative References . . . . . . . . . . . . . . . . . 7
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction
Ambisonics is a representation format for three dimensional sound
fields which can be used for surround sound and immersive virtual
reality playback. See [gerzon75] and [daniel04] for technical
details on the ambisonics format. For the purposes of this document,
ambisonics can be considered a multichannel audio stream. A separate
stereo stream can be used alongside the ambisonics in a head-tracked
virtual reality experience to provide so-called non-diegetic audio -
audio which should remain unchanged by listener head rotation; e.g.,
narration or stereo music. Ogg is a general purpose container,
supporting audio, video, and other media. It can be used to
encapsulate audio streams coded using the Opus codec. See [RFC6716]
and [RFC7845] for technical details on the Opus codec and its
encapsulation in the Ogg container respectively.
This document extends the Ogg format by defining two new channel
mapping families for encoding ambisonics. The Ogg Opus format is
extended indirectly by adding an item with value 2 or 3 to the IANA
"Opus Channel Mapping Families" registry. When 2 or 3 are used as
the Channel Mapping Family Number in an Ogg stream, the semantic
meaning of the channels in the multichannel Opus stream is one of the
ambisonics layouts defined in this document. This mapping can also
be used in other contexts which make use of the channel mappings
defined by the Opus Channel Mapping Families registry.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
Skoglund & Graczyk Expires September 18, 2017 [Page 2]
Internet-Draft Opus Ambisonics March 2017
3. Ambisonics With Ogg Opus
Ambisonics can be encapsulated in the Ogg format by encoding with the
Opus codec and setting the channel mapping family value to 2 or 3 in
the Ogg identification header (ID). A demuxer implementation
encountering Channel Mapping Family 2 or Family 3 MUST interpret the
Opus stream as containing ambisonics with the format described in
Section 3.1 or Section 3.2, respectively.
3.1. Channel Mapping Family 2
Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0
or 1, where n denotes the (highest) ambisonic order and j whether or
not there is a separate non-diegetic stereo stream. This corresponds
to periphonic ambisonics from zeroth to fourteenth order plus
potentially two channels of non-diegetic stereo. Explicitly the
allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36,
38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171,
196, 198, 225, 227.
This channel mapping uses the same channel mapping table format used
by channel mapping family 1. The output channels are ambisonic
components ordered in Ambisonic Channel Number (ACN) order, defined
in Figure 1, followed by two optional channels of non-diegetic stereo
indexed (left, right).
ACN = n * (n + 1) + m,
for order n and degree m.
Figure 1: Ambisonic Channel Number (ACN)
For the ambisonic channels the ACN component corresponds to channel
index as k = ACN. The reverse correspondence can also be computed
for an ambisonic channel with index k.
order n = floor(sqrt(k)),
degree m = k - n * (n + 1).
Figure 2: Ambisonic Degree and Order from ACN
Note that channel mapping family 2 allows for so-called mixed order
ambisonic representations where only a subset of the full ambisonic
order number of channels is used. By specifying the full number in
the channel count field, the inactive ACNs can then be indicated in
the channel mapping field using the index 255.
Ambisonic channels are expected to be normalized with Schmidt Semi-
Normalization (SN3D). The interpretation of the ambisonics signal as
Skoglund & Graczyk Expires September 18, 2017 [Page 3]
Internet-Draft Opus Ambisonics March 2017
well as detailed definitions of ACN channel ordering and SN3D
normalization are described in [ambix] Section 2.1.
3.2. Channel Mapping Family 3
Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...12 and j = 0
or 1, where n denotes the (highest) ambisonic order and j whether or
not there is a separate non-diegetic stereo stream. This corresponds
to periphonic ambisonics from zeroth to twelfth order plus
potentially two channels of non-diegetic stereo. Explicitly the
allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36,
38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171.
In this mapping, C output channels (the channel count) are generated
at the decoder by multiplying K = N + M decoded channels with a
designated demixing matrix, D, having C rows and K columns. Here, N
denotes the number of streams encoded and M the number of these which
are coupled to produce two channels. As for channel mapping family 2
this mapping family also allows for encoding and decoding of full
order ambisonics, mixed order ambisonics, and for non-diegetic stereo
channels, but also has the added flexibility of mixing channels. Let
X denote a column vector containing K decoded channels X1, X2, ...,
XK (from N streams), and let S denote a column vector containing C
output streams S1, S2, ..., SC. Then S = D X, i.e.,
/ \ / \ / \
| S1 | | D11 D12 ... D1K | | X1 |
| S2 | | D21 D22 ... D2K | | X2 |
| ... | = | ... ... ... ... | | ... |
| SC | | DC1 DC2 ... DCK | | XK |
\ / \ / \ /
Figure 3: Demixing in Channel Mapping Family 3
The matrix MUST be provided as side information and MUST be stored in
the channel mapping table part of the identification header, c.f.
section 5.1.1 in [RFC7845]. The matrix replaces the need for a
channel mapping field and for channel mapping family 3 the mapping
table has the following layout:
Skoglund & Graczyk Expires September 18, 2017 [Page 4]
Internet-Draft Opus Ambisonics March 2017
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Stream Count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Coupled Count | Demixing Matrix :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Channel Mapping Table for Channel Mapping Family 3
The fields in the channel mapping table have the following meaning:
1. Stream Count 'N' (8 bits, unsigned):
This is the total number of streams encoded in each Ogg packet.
2. Coupled Stream Count 'M' (8 bits, unsigned):
This is the number of the N streams whose decoders are to be
configured to produce two channels (stereo).
3. Demixing Matrix (16*K*C bits, signed):
The coefficients of the demixing matrix stored column-wise as
16-bit, signed, two's complement fixed-point values with 15
fractional bits (Q15). If needed, the output gain field can be
used for a normalization scale. For mixed order ambisonic
representations, the silent ACN channels are indicated by all
zeros in the corresponding rows of the demixing matrix.
Note that [RFC7845] specifies that the identification header cannot
exceed one "page", which is 65,025 octets. This limits the ambisonic
order to be lower than 12. Also note that the total output channel
number, C, MUST be set in the 3rd field of the identification header.
4. Downmixing
An Ogg Opus player MAY use the matrix in Figure 5 to implement
downmixing from multichannel files using channel mapping family 2 and
3, when there is no non-diegetic stereo. This downmixing is known to
give acceptable results for stereo downmixing from ambisonics. The
first and second ambisonic channels are known as "W" and "Y"
respectively.
Skoglund & Graczyk Expires September 18, 2017 [Page 5]
Internet-Draft Opus Ambisonics March 2017
/ \ / \ / \
| L | | 0.5 0.5 0.0 ... | | W |
| R | = | 0.5 -0.5 0.0 ... | | Y |
\ / \ / | ... |
\ /
Figure 5: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3
- only Ambisonic Channels
The first ambisonic channel (W) is a mono audio stream which
represents the average audio signal over all directions. Since W is
not directional, Ogg Opus players MAY use W directly for mono
playback.
If a non-diegetic stereo track is present, the player MAY use the
matrix in Figure 6 for downmixing. Ls and Rs denote the two non-
diegetic stereo channels.
/ \ / \ / \
| L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W |
| R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y |
\ / \ / | ... |
| Ls |
| Rs |
\ /
Figure 6: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3
- Ambisonic Channels Plus a Non-diegetic Stereo Stream
5. Security Considerations
Implementations of the Ogg container need take appropriate security
considerations into account, as outlined in Section 10 of [RFC7845].
The extension defined in this document requires that semantic meaning
be assigned to more channels than the existing Ogg format requires.
Since more allocations will be required to encode and decode these
semantically meaningful channels, care should be taken in any new
allocation paths. Implementations MUST NOT overrun their allocated
memory nor read from uninitialized memory when managing the ambisonic
channel mapping.
6. IANA Considerations
This document updates the IANA Media Types registry "Opus Channel
Mapping Families" to add two new assignments.
Skoglund & Graczyk Expires September 18, 2017 [Page 6]
Internet-Draft Opus Ambisonics March 2017
+-------+---------------------------+
| Value | Reference |
+-------+---------------------------+
| 2 | This Document Section 3.1 |
| | |
| 3 | This Document Section 3.2 |
+-------+---------------------------+
7. Acknowledgments
Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin
Gorzel and Andrew Allen for their guidance and valuable contributions
to this document.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the
Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716,
September 2012, <http://www.rfc-editor.org/info/rfc6716>.
[RFC7845] Terriberry, T., Lee, R., and R. Giles, "Ogg Encapsulation
for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845,
April 2016, <http://www.rfc-editor.org/info/rfc7845>.
[ambix] Nachbar, C., Zotter, F., Deleflie, E., and A. Sontacchi,
"AMBIX - A SUGGESTED AMBISONICS FORMAT", June 2011,
<http://iem.kug.ac.at/fileadmin/media/iem/projects/2011/
ambisonics11_nachbar_zotter_sontacchi_deleflie.pdf>.
8.2. Informative References
[gerzon75]
Gerzon, M., "Ambisonics. Part one: General system
description", August 1975,
<http://www.michaelgerzonphotos.org.uk/articles/
Ambisonics%201.pdf>.
Skoglund & Graczyk Expires September 18, 2017 [Page 7]
Internet-Draft Opus Ambisonics March 2017
[daniel04]
Daniel, J. and S. Moreau, "Further Study of Sound Field
Coding with Higher Order Ambisonics", May 2004,
<http://pcfarina.eng.unipr.it/Public/phd-thesis/
aes116%20high-passed%20hoa.pdf>.
Authors' Addresses
Jan Skoglund
Google Inc.
1600 Amphitheatre Parkway
Mountain View, CA 94043
USA
Email: [email protected]
Michael Graczyk
Google Inc.
1600 Amphitheatre Parkway
Mountain View, CA 94043
USA
Email: [email protected]
Skoglund & Graczyk Expires September 18, 2017 [Page 8]
Title: Ambisonics in an Ogg Opus Container
| codec | J. Skoglund |
| Internet-Draft | M. Graczyk |
| Intended status: Standards Track | Google Inc. |
| Expires: September 18, 2017 | March 17, 2017 |
Ambisonics in an Ogg Opus Container
draft-ietf-codec-ambisonics-02
Abstract
This document defines an extension to the Ogg format to encapsulate ambisonics coded using the Opus audio codec.
Status of This Memo
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 18, 2017.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Table of Contents
- 1. Introduction
- 2. Terminology
- 3. Ambisonics With Ogg Opus
- 4. Downmixing
- 5. Security Considerations
- 6. IANA Considerations
- 7. Acknowledgments
- 8. References
- 8.1. Normative References
- 8.2. Informative References
- Authors' Addresses
1. Introduction
Ambisonics is a representation format for three dimensional sound fields which can be used for surround sound and immersive virtual reality playback. See [gerzon75] and [daniel04] for technical details on the ambisonics format. For the purposes of this document, ambisonics can be considered a multichannel audio stream. A separate stereo stream can be used alongside the ambisonics in a head-tracked virtual reality experience to provide so-called non-diegetic audio - audio which should remain unchanged by listener head rotation; e.g., narration or stereo music. Ogg is a general purpose container, supporting audio, video, and other media. It can be used to encapsulate audio streams coded using the Opus codec. See [RFC6716] and [RFC7845] for technical details on the Opus codec and its encapsulation in the Ogg container respectively.
This document extends the Ogg format by defining two new channel mapping families for encoding ambisonics. The Ogg Opus format is extended indirectly by adding an item with value 2 or 3 to the IANA "Opus Channel Mapping Families" registry. When 2 or 3 are used as the Channel Mapping Family Number in an Ogg stream, the semantic meaning of the channels in the multichannel Opus stream is one of the ambisonics layouts defined in this document. This mapping can also be used in other contexts which make use of the channel mappings defined by the Opus Channel Mapping Families registry.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
3. Ambisonics With Ogg Opus
Ambisonics can be encapsulated in the Ogg format by encoding with the Opus codec and setting the channel mapping family value to 2 or 3 in the Ogg identification header (ID). A demuxer implementation encountering Channel Mapping Family 2 or Family 3 MUST interpret the Opus stream as containing ambisonics with the format described in Section 3.1 or Section 3.2, respectively.
3.1. Channel Mapping Family 2
Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0 or 1, where n denotes the (highest) ambisonic order and j whether or not there is a separate non-diegetic stereo stream. This corresponds to periphonic ambisonics from zeroth to fourteenth order plus potentially two channels of non-diegetic stereo. Explicitly the allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 196, 198, 225, 227.
This channel mapping uses the same channel mapping table format used by channel mapping family 1. The output channels are ambisonic components ordered in Ambisonic Channel Number (ACN) order, defined in Figure 1, followed by two optional channels of non-diegetic stereo indexed (left, right).
ACN = n * (n + 1) + m, for order n and degree m.
Figure 1: Ambisonic Channel Number (ACN)
For the ambisonic channels the ACN component corresponds to channel index as k = ACN. The reverse correspondence can also be computed for an ambisonic channel with index k.
order n = floor(sqrt(k)), degree m = k - n * (n + 1).
Figure 2: Ambisonic Degree and Order from ACN
Note that channel mapping family 2 allows for so-called mixed order ambisonic representations where only a subset of the full ambisonic order number of channels is used. By specifying the full number in the channel count field, the inactive ACNs can then be indicated in the channel mapping field using the index 255.
Ambisonic channels are expected to be normalized with Schmidt Semi-Normalization (SN3D). The interpretation of the ambisonics signal as well as detailed definitions of ACN channel ordering and SN3D normalization are described in [ambix] Section 2.1.
3.2. Channel Mapping Family 3
Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...12 and j = 0 or 1, where n denotes the (highest) ambisonic order and j whether or not there is a separate non-diegetic stereo stream. This corresponds to periphonic ambisonics from zeroth to twelfth order plus potentially two channels of non-diegetic stereo. Explicitly the allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171.
In this mapping, C output channels (the channel count) are generated at the decoder by multiplying K = N + M decoded channels with a designated demixing matrix, D, having C rows and K columns. Here, N denotes the number of streams encoded and M the number of these which are coupled to produce two channels. As for channel mapping family 2 this mapping family also allows for encoding and decoding of full order ambisonics, mixed order ambisonics, and for non-diegetic stereo channels, but also has the added flexibility of mixing channels. Let X denote a column vector containing K decoded channels X1, X2, ..., XK (from N streams), and let S denote a column vector containing C output streams S1, S2, ..., SC. Then S = D X, i.e.,
/ \ / \ / \ | S1 | | D11 D12 ... D1K | | X1 | | S2 | | D21 D22 ... D2K | | X2 | | ... | = | ... ... ... ... | | ... | | SC | | DC1 DC2 ... DCK | | XK | \ / \ / \ /
Figure 3: Demixing in Channel Mapping Family 3
The matrix MUST be provided as side information and MUST be stored in the channel mapping table part of the identification header, c.f. section 5.1.1 in [RFC7845]. The matrix replaces the need for a channel mapping field and for channel mapping family 3 the mapping table has the following layout:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Stream Count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Coupled Count | Demixing Matrix :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Channel Mapping Table for Channel Mapping Family 3
The fields in the channel mapping table have the following meaning:
- Stream Count 'N' (8 bits, unsigned):
This is the total number of streams encoded in each Ogg packet.
- Coupled Stream Count 'M' (8 bits, unsigned):
This is the number of the N streams whose decoders are to be configured to produce two channels (stereo).
- Demixing Matrix (16*K*C bits, signed):
The coefficients of the demixing matrix stored column-wise as 16-bit, signed, two's complement fixed-point values with 15 fractional bits (Q15). If needed, the output gain field can be used for a normalization scale. For mixed order ambisonic representations, the silent ACN channels are indicated by all zeros in the corresponding rows of the demixing matrix.
Note that [RFC7845] specifies that the identification header cannot exceed one "page", which is 65,025 octets. This limits the ambisonic order to be lower than 12. Also note that the total output channel number, C, MUST be set in the 3rd field of the identification header.
4. Downmixing
An Ogg Opus player MAY use the matrix in Figure 5 to implement downmixing from multichannel files using channel mapping family 2 and 3, when there is no non-diegetic stereo. This downmixing is known to give acceptable results for stereo downmixing from ambisonics. The first and second ambisonic channels are known as "W" and "Y" respectively.
/ \ / \ / \
| L | | 0.5 0.5 0.0 ... | | W |
| R | = | 0.5 -0.5 0.0 ... | | Y |
\ / \ / | ... |
\ /
Figure 5: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 - only Ambisonic Channels
The first ambisonic channel (W) is a mono audio stream which represents the average audio signal over all directions. Since W is not directional, Ogg Opus players MAY use W directly for mono playback.
If a non-diegetic stereo track is present, the player MAY use the matrix in Figure 6 for downmixing. Ls and Rs denote the two non-diegetic stereo channels.
/ \ / \ / \
| L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W |
| R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y |
\ / \ / | ... |
| Ls |
| Rs |
\ /
Figure 6: Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 - Ambisonic Channels Plus a Non-diegetic Stereo Stream
5. Security Considerations
Implementations of the Ogg container need take appropriate security considerations into account, as outlined in Section 10 of [RFC7845]. The extension defined in this document requires that semantic meaning be assigned to more channels than the existing Ogg format requires. Since more allocations will be required to encode and decode these semantically meaningful channels, care should be taken in any new allocation paths. Implementations MUST NOT overrun their allocated memory nor read from uninitialized memory when managing the ambisonic channel mapping.
6. IANA Considerations
This document updates the IANA Media Types registry "Opus Channel Mapping Families" to add two new assignments.
| Value | Reference |
|---|---|
| 2 | This Document Section 3.1 |
| 3 | This Document Section 3.2 |
7. Acknowledgments
Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin Gorzel and Andrew Allen for their guidance and valuable contributions to this document.
8. References
8.1. Normative References
| [RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
| [RFC6716] | Valin, JM., Vos, K. and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, September 2012. |
| [RFC7845] | Terriberry, T., Lee, R. and R. Giles, "Ogg Encapsulation for the Opus Audio Codec", RFC 7845, DOI 10.17487/RFC7845, April 2016. |
| [ambix] | Nachbar, C., Zotter, F., Deleflie, E. and A. Sontacchi, "AMBIX - A SUGGESTED AMBISONICS FORMAT", June 2011. |
8.2. Informative References
| [gerzon75] | Gerzon, M., "Ambisonics. Part one: General system description", August 1975. |
| [daniel04] | Daniel, J. and S. Moreau, "Further Study of Sound Field Coding with Higher Order Ambisonics", May 2004. |
Authors' Addresses
<?xml version='1.0' encoding='ascii'?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes" symrefs="yes" ?>
<rfc ipr="trust200902" category="std" docName="draft-ietf-codec-ambisonics-02" obsoletes="" updates="" submissionType="IETF" xml:lang="en">
<front>
<title abbrev="Opus Ambisonics">Ambisonics in an Ogg Opus Container</title>
<author initials="J." surname="Skoglund" fullname="Jan Skoglund">
<organization>Google Inc.</organization>
<address>
<postal>
<street>1600 Amphitheatre Parkway</street>
<city>Mountain View</city>
<region>CA</region>
<code>94043</code>
<country>USA</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<author initials="M.G." surname="Graczyk" fullname="Michael Graczyk">
<organization>Google Inc.</organization>
<address>
<postal>
<street>1600 Amphitheatre Parkway</street>
<city>Mountain View</city>
<region>CA</region>
<code>94043</code>
<country>USA</country>
</postal>
<email>[email protected]</email>
</address>
</author>
<date day="17" month="March" year="2017"/>
<area>RAI</area>
<workgroup>codec</workgroup>
<abstract>
<t>This document defines an extension to the Ogg format to encapsulate ambisonics coded using the Opus audio codec. </t>
</abstract>
</front>
<middle>
<section anchor="intro" title="Introduction" toc="default">
<t>Ambisonics is a representation format for three dimensional sound fields which can be used for surround sound and immersive virtual
reality playback. See <xref target="gerzon75" pageno="false" format="default"/> and <xref target="daniel04" pageno="false" format="default"/>
for technical details on the ambisonics format. For the purposes of this document, ambisonics can be considered a multichannel audio stream.
A separate stereo stream can be used alongside the ambisonics in a head-tracked virtual reality experience to provide so-called non-diegetic audio - audio which should remain unchanged by listener head rotation; e.g., narration or stereo music. Ogg is a general purpose container, supporting audio, video, and other media.
It can be used to encapsulate audio streams coded using the Opus codec. See <xref target="RFC6716" pageno="false" format="default"/> and <xref target="RFC7845" pageno="false" format="default"/> for technical details on the Opus codec and its encapsulation in the Ogg container respectively. </t>
<t>This document extends the Ogg format by defining two new channel mapping families for encoding ambisonics. The Ogg Opus format
is extended indirectly by adding an item with value 2 or 3 to the IANA "Opus Channel Mapping Families" registry. When 2 or 3 are used
as the Channel Mapping Family Number in an Ogg stream, the semantic meaning of the channels in the multichannel Opus stream is one of
the ambisonics layouts defined in this document. This mapping can also be used in other contexts which make use of the channel mappings
defined by the Opus Channel Mapping Families registry. </t>
</section>
<section anchor="terminology" title="Terminology" toc="default">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119" pageno="false" format="default"/>. </t>
</section>
<section anchor="ogg_extension" title="Ambisonics With Ogg Opus" toc="default">
<t>Ambisonics can be encapsulated in the Ogg format by encoding with the Opus codec and setting the channel mapping family value to 2 or 3 in the Ogg identification header (ID). A demuxer implementation encountering Channel Mapping Family 2 or Family 3 MUST interpret the Opus stream as containing ambisonics with the format described in <xref target="channel_mapping_2" pageno="false" format="default"/> or <xref target="channel_mapping_3" pageno="false" format="default"/>, respectively. </t>
<section anchor="channel_mapping_2" title="Channel Mapping Family 2" toc="default">
<t>Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...14 and j = 0 or 1, where n denotes the (highest) ambisonic order and j whether or not there is a separate non-diegetic stereo stream. This corresponds to periphonic ambisonics from zeroth to fourteenth order plus potentially two channels of non-diegetic stereo. Explicitly the allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171, 196, 198, 225, 227. </t>
<t>This channel mapping uses the same channel mapping table format used by channel mapping family 1. The output channels are ambisonic components ordered in Ambisonic Channel Number (ACN) order, defined in <xref target="ACN" pageno="false" format="default"/>, followed by two optional channels of non-diegetic stereo indexed (left, right). </t>
<figure anchor="ACN" title="Ambisonic Channel Number (ACN)" align="center" suppress-title="false" alt="" width="" height="">
<artwork align="center" xml:space="preserve" name="" type="" alt="" width="" height="">
ACN = n * (n + 1) + m,
for order n and degree m.
</artwork>
</figure>
<t>For the ambisonic channels the ACN component corresponds to channel index as k = ACN. The reverse correspondence can also be computed for an ambisonic channel with index k. </t>
<figure anchor="inverseACN" title="Ambisonic Degree and Order from ACN" align="center" suppress-title="false" alt="" width="" height="">
<artwork align="center" xml:space="preserve" name="" type="" alt="" width="" height="">
order n = floor(sqrt(k)),
degree m = k - n * (n + 1).
</artwork>
</figure>
<t> Note that channel mapping family 2 allows for so-called mixed order ambisonic representations where only a subset of the full ambisonic order number of channels is used. By specifying the full number in the channel count field, the inactive ACNs can then be indicated in the channel mapping field using the index 255.</t>
<t>Ambisonic channels are expected to be normalized with Schmidt Semi-Normalization (SN3D). The interpretation of the ambisonics signal as well as detailed definitions of ACN channel ordering and SN3D normalization are described in <xref target="ambix" pageno="false" format="default"/> Section 2.1. </t>
</section>
<section anchor="channel_mapping_3" title="Channel Mapping Family 3" toc="default">
<t> Allowed numbers of channels: (1 + n)^2 + 2j for n = 0...12 and j = 0 or 1, where n denotes the (highest) ambisonic order and j whether or not there is a separate non-diegetic stereo stream. This corresponds to periphonic ambisonics from zeroth to twelfth order plus potentially two channels of non-diegetic stereo. Explicitly the allowed number of channels are 1, 3, 4, 6, 9, 11, 16, 18, 25, 27, 36, 38, 49, 51, 64, 66, 81, 83, 100, 102, 121, 123, 144, 146, 169, 171.
</t><t>In this mapping, C output channels (the channel count) are generated at the decoder by multiplying K = N + M decoded channels with a designated demixing matrix, D, having C rows and K columns. Here, N denotes the number of streams encoded and M the number of these which are coupled to produce two channels. As for channel mapping family 2 this mapping family also allows for encoding and decoding of full order ambisonics, mixed order ambisonics, and for non-diegetic stereo channels, but also has the added flexibility of mixing channels. Let X denote a column vector containing K decoded channels X1, X2, ..., XK (from N streams), and let S denote a column vector containing C output streams S1, S2, ..., SC. Then S = D X, i.e., </t>
<figure anchor="demixing" title="Demixing in Channel Mapping Family 3" align="center" suppress-title="false" alt="" width="" height="">
<artwork align="center" xml:space="preserve" name="" type="" alt="" width="" height="">
/ \ / \ / \
| S1 | | D11 D12 ... D1K | | X1 |
| S2 | | D21 D22 ... D2K | | X2 |
| ... | = | ... ... ... ... | | ... |
| SC | | DC1 DC2 ... DCK | | XK |
\ / \ / \ /
</artwork>
</figure>
<t>The matrix MUST be provided as side information and MUST be stored in the channel mapping table part of the identification header, c.f. section 5.1.1 in <xref target="RFC7845" pageno="false" format="default"/>. The matrix replaces the need for a channel mapping field and for channel mapping family 3 the mapping table has the following layout: </t>
<figure anchor="channel_mapping" title="Channel Mapping Table for Channel Mapping Family 3" align="center" suppress-title="false" alt="" width="" height="">
<artwork align="center" xml:space="preserve" name="" type="" alt="" width="" height="">
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+
| Stream Count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Coupled Count | Demixing Matrix :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork>
</figure>
<t>The fields in the channel mapping table have the following meaning: <list style="numbers" counter="8"><t>Stream Count 'N' (8 bits, unsigned): <vspace blankLines="1"/> This is the total number of streams encoded in each Ogg packet. <vspace blankLines="1"/> </t><t>Coupled Stream Count 'M' (8 bits, unsigned): <vspace blankLines="1"/> This is the number of the N streams whose decoders are to be configured to produce two channels (stereo). <vspace blankLines="1"/> </t><t>Demixing Matrix (16*K*C bits, signed): <vspace blankLines="1"/> The coefficients of the demixing matrix stored column-wise as 16-bit, signed, two's complement fixed-point values with 15 fractional bits (Q15). If needed, the output gain field can be used for a normalization scale. For mixed order ambisonic representations, the silent ACN channels are indicated by all zeros in the corresponding rows of the demixing matrix. </t></list> </t>
<t>Note that <xref target="RFC7845" pageno="false" format="default"/> specifies that the identification header cannot exceed one "page", which is 65,025 octets. This limits the ambisonic order to be lower than 12. Also note that the total output channel number, C, MUST be set in the 3rd field of the identification header. </t>
</section>
</section>
<section anchor="downmixing" title="Downmixing" toc="default">
<t>An Ogg Opus player MAY use the matrix in Figure <xref target="stereo_downmix_matrix_1" format="counter" pageno="false"/> to implement downmixing from multichannel files using channel mapping family 2 and 3, when there is no non-diegetic stereo. This downmixing is known to give acceptable results for stereo downmixing from ambisonics. The first and second ambisonic channels are known as "W" and "Y" respectively. </t>
<figure anchor="stereo_downmix_matrix_1" title="Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 - only Ambisonic Channels" align="center" suppress-title="false" alt="" width="" height="">
<artwork align="center" xml:space="preserve" name="" type="" alt="" width="" height="">
/ \ / \ / \
| L | | 0.5 0.5 0.0 ... | | W |
| R | = | 0.5 -0.5 0.0 ... | | Y |
\ / \ / | ... |
\ /
</artwork>
</figure>
<t>The first ambisonic channel (W) is a mono audio stream which represents the average audio signal over all directions. Since W is not directional, Ogg Opus players MAY use W directly for mono playback. </t>
<t>If a non-diegetic stereo track is present, the player MAY use the matrix in Figure <xref target="stereo_downmix_matrix_2" format="counter" pageno="false"/> for downmixing. Ls and Rs denote the two non-diegetic stereo channels. </t>
<figure anchor="stereo_downmix_matrix_2" title="Stereo Downmixing Matrix for Channel Mapping Family 2 and 3 - Ambisonic Channels Plus a Non-diegetic Stereo Stream" align="center" suppress-title="false" alt="" width="" height="">
<artwork align="center" xml:space="preserve" name="" type="" alt="" width="" height="">
/ \ / \ / \
| L | | 0.25 0.25 0.0 ... 0.5 0.0 | | W |
| R | = | 0.25 -0.25 0.0 ... 0.0 0.5 | | Y |
\ / \ / | ... |
| Ls |
| Rs |
\ /
</artwork>
</figure>
</section>
<section anchor="security" title="Security Considerations" toc="default">
<t>Implementations of the Ogg container need take appropriate security considerations into account, as outlined in Section 10 of <xref target="RFC7845" pageno="false" format="default"/>. The extension defined in this document requires that semantic meaning be assigned to more channels than the existing Ogg format requires. Since more allocations will be required to encode and decode these semantically meaningful channels, care should be taken in any new allocation paths. Implementations MUST NOT overrun their allocated memory nor read from uninitialized memory when managing the ambisonic channel mapping. </t>
</section>
<section anchor="iana" title="IANA Considerations" toc="default">
<t>This document updates the IANA Media Types registry "Opus Channel Mapping Families" to add two new assignments. </t>
<texttable title="" suppress-title="false" align="center" style="full">
<ttcol align="left">Value</ttcol>
<ttcol align="left">Reference</ttcol>
<c>2</c>
<c>This Document <xref target="channel_mapping_2" pageno="false" format="default"/></c>
<c>3</c>
<c>This Document <xref target="channel_mapping_3" pageno="false" format="default"/></c>
</texttable>
</section>
<section anchor="Acknowledgments" title="Acknowledgments" toc="default">
<t>Thanks to Timothy Terriberry, Jean-Marc Valin, Mark Harris, Marcin Gorzel and Andrew Allen for their guidance and valuable contributions to this document. </t>
</section>
</middle>
<back>
<references title="Normative References"><reference anchor="RFC2119" target="http://www.rfc-editor.org/info/rfc2119" quote-title="true"><front><title>Key words for use in RFCs to Indicate Requirement Levels</title><author initials="S." surname="Bradner" fullname="S. Bradner"><organization/></author><date year="1997" month="March"/><abstract><t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t></abstract></front><seriesInfo name="BCP" value="14"/><seriesInfo name="RFC" value="2119"/><seriesInfo name="DOI" value="10.17487/RFC2119"/></reference> <reference anchor="RFC6716" target="http://www.rfc-editor.org/info/rfc6716" quote-title="true"><front><title>Definition of the Opus Audio Codec</title><author initials="JM." surname="Valin" fullname="JM. Valin"><organization/></author><author initials="K." surname="Vos" fullname="K. Vos"><organization/></author><author initials="T." surname="Terriberry" fullname="T. Terriberry"><organization/></author><date year="2012" month="September"/><abstract><t>This document defines the Opus interactive speech and audio codec. Opus is designed to handle a wide range of interactive audio applications, including Voice over IP, videoconferencing, in-game chat, and even live, distributed music performances. It scales from low bitrate narrowband speech at 6 kbit/s to very high quality stereo music at 510 kbit/s. Opus uses both Linear Prediction (LP) and the Modified Discrete Cosine Transform (MDCT) to achieve good compression of both speech and music. [STANDARDS-TRACK]</t></abstract></front><seriesInfo name="RFC" value="6716"/><seriesInfo name="DOI" value="10.17487/RFC6716"/></reference> <reference anchor="RFC7845" target="http://www.rfc-editor.org/info/rfc7845" quote-title="true"><front><title>Ogg Encapsulation for the Opus Audio Codec</title><author initials="T." surname="Terriberry" fullname="T. Terriberry"><organization/></author><author initials="R." surname="Lee" fullname="R. Lee"><organization/></author><author initials="R." surname="Giles" fullname="R. Giles"><organization/></author><date year="2016" month="April"/><abstract><t>This document defines the Ogg encapsulation for the Opus interactive speech and audio codec. This allows data encoded in the Opus format to be stored in an Ogg logical bitstream.</t></abstract></front><seriesInfo name="RFC" value="7845"/><seriesInfo name="DOI" value="10.17487/RFC7845"/></reference> <reference anchor="ambix" target="http://iem.kug.ac.at/fileadmin/media/iem/projects/2011/ambisonics11_nachbar_zotter_sontacchi_deleflie.pdf" quote-title="true"><front><title>AMBIX - A SUGGESTED AMBISONICS FORMAT</title><author initials="C." surname="Nachbar" fullname="Christian Nachbar"/><author initials="F." surname="Zotter" fullname="Franz Zotter"/><author initials="E." surname="Deleflie" fullname="Etienne Deleflie"/><author initials="A." surname="Sontacchi" fullname="Alois Sontacchi"/><date month="June" year="2011"/></front></reference></references>
<references title="Informative References">
<reference anchor="gerzon75" target="http://www.michaelgerzonphotos.org.uk/articles/Ambisonics%201.pdf" quote-title="true">
<front>
<title>Ambisonics. Part one: General system description</title>
<author initials="M." surname="Gerzon" fullname="Michael Gerzon"/>
<date month="August" year="1975"/>
</front>
</reference>
<reference anchor="daniel04" target="http://pcfarina.eng.unipr.it/Public/phd-thesis/aes116%20high-passed%20hoa.pdf" quote-title="true">
<front>
<title>Further Study of Sound Field Coding with Higher Order Ambisonics</title>
<author initials="J." surname="Daniel" fullname="Jérôme Daniel"/>
<author initials="S." surname="Moreau" fullname="Sébastien Moreau"/>
<date month="May" year="2004"/>
</front>
</reference>
</references>
</back>
</rfc>
_______________________________________________ codec mailing list [email protected] https://www.ietf.org/mailman/listinfo/codec
