================
@@ -0,0 +1,548 @@
+Pointer Authentication
+======================
+
+.. contents::
+ :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic
protection against exploiting a broad class of memory bugs to take control of
program execution. When adopted consistently in a language ABI, it provides a
form of relatively fine-grained control flow integrity (CFI) check that resists
both return-oriented programming (ROP) and jump-oriented programming (JOP)
attacks.
+
+While pointer authentication can be implemented purely in software, direct
hardware support (e.g. as provided by ARMv8.3) can dramatically lower the
execution speed and code size costs. Similarly, while pointer authentication
can be implemented on any architecture, taking advantage of the (typically)
excess addressing range of a target with 64-bit pointers minimizes the impact
on memory performance and can allow interoperation with existing code (by
disabling pointer authentication dynamically). This document will generally
attempt to present the pointer authentication feature independent of any
hardware implementation or ABI. Considerations that are
implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **ARMv8.3** is an AArch64 architecture revision of that provides hardware
support for pointer authentication. It is implemented on several shipping
processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI for (not yet fully stable) for implementing
pointer authentication on ARMv8.3 on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using
pointer authentication.
+
+- It presents a theory of operation for the security mitigation, describing
the basic requirements for correctness, various weaknesses in the mechanism,
and ways in which programmers can strengthen its protections (including
recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++,
Objective-C, and Swift on arm64e, although these are not yet stable on any
target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**. A raw
pointer can be **signed** to produce a **signed pointer**. A signed pointer
can be then **authenticated** in order to verify that it was **validly signed**
and extract the original raw pointer. These terms reflect the most likely
implementation technique: computing and storing a cryptographic signature along
with the pointer. The security of pointer authentication does not rely on
attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can
used to sign and authenticate pointers. The key value for a particular name is
consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed
pointers so that one validly-signed pointer cannot simply be copied over
another. A discriminator is simply opaque data of some implementation-defined
size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary
operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a
raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a
signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must
succeed and produce ``raw_pointer``. ``auth`` applied to a value that was
ultimately produced in any other way is expected to immediately halt the
program. However, it is permitted for ``auth`` to fail to detect that a signed
pointer was not produced in this way, in which case it may return anything;
this is what makes pointer authentication a probabilistic mitigation rather
than a perfect one.
+
+There are two secondary operations which are required only to implement
certain intrinsics in ``<ptrauth.h>``:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer
and a key it was presumptively signed with. This is useful for certain kinds
of tooling, such as crash backtraces; it should generally not be used in the
basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary
data, not necessarily a pointer. This is useful for efficiently verifying that
non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known
statically. This is because the layout of a signed pointer may vary according
to the signing key. (For example, in ARMv8.3, the layout of a signed pointer
depends on whether TBI is enabled, which can be set independently for code and
data pointers.)
+
+.. admonition:: Note for API designers and language implementors
+
+ These are the *primitive* operations of pointer authentication, provided for
clarity of description. They are not suitable either as high-level interfaces
or as primitives in a compiler IR because they expose raw pointers. Raw
pointers require special attention in the language implementation to avoid the
accidental creation of exploitable code sequences; see the section on
`Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and
``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a
cryptographic signature, other implementations may be possible. See
`Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+ Readers may find it helpful to know how these terms map to ARMv8.3:
+
+ - A signed pointer is a pointer with a signature stored in the
otherwise-unused high bits. The kernel configures the signature width based on
the system's addressing needs, accounting for whether the AArch64 TBI feature
is enabled for the kind of pointer (code or data).
+
+ - A discriminator is a 64-bit integer. Constant discriminators are 16-bit
integers. Blending a constant discriminator into an address consists of
replacing the top 16 bits of the address with the constant.
+
+ - There are five 128-bit signing-key registers, each of which can only be
directly read or set by privileged code. Of these, four are used for signing
pointers, and the fifth is used only for ``sign_generic``. The key data is
simply a pepper added to the hash, not an encryption key, and so can be
initialized using random data.
+
+ - ``sign`` computes a cryptographic hash of the pointer, discriminator, and
signing key, and stores it in the high bits as the signature. ``auth`` removes
the signature, computes the same hash, and compares the result with the stored
signature. ``strip`` removes the signature without authenticating it. While
ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler
only ever emits them in sequences that will trap.
+
+ - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two
64-bit values and produces a 64-bit cryptographic hash. Implementations of this
instruction may not produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature on a
pointer. When two pointers are signed differently --- either with different
keys or with different discriminators --- an attacker cannot simply replace one
pointer with the other. For more information on why discriminators are
important and how to use them effectively, see the section on `Substitution
attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in
the signing of a pointer, and the key data acts as a pepper. That is, both the
discriminator and key data are ultimately just added as inputs to the signing
algorithm along with the pointer, but they serve significantly different roles.
The key data is a common secret added to every signature, whereas the
discriminator is a signing-specific value that can be derived from the
circumstances of how a pointer is signed. However, unlike a password salt,
it's important that discriminators be *independently* derived from the
circumstances of the signing; they should never simply be stored alongside a
pointer.
+
+The intrinsic interface in ``<ptrauth.h>`` allows an arbitrary discriminator
value to be provided, but can only be used when running normal code. The
discriminators used by language ABIs must be restricted to make it feasible for
the loader to sign pointers stored in global memory without needing excessive
amounts of metadata. Under these restrictions, a discriminator may consist of
either or both of the following:
+
+- The address at which the pointer is stored in memory. A pointer signed with
a discriminator which incorporates its storage address is said to have
**address diversity**. In general, using address diversity means that a
pointer cannot be reliably replaced by an attacker or used to reliably replace
a different pointer. However, an attacker may still be able to attack a larger
call sequence if they can alter the address through which the pointer is
accessed. Furthermore, some situations cannot use address diversity because of
language or other restrictions.
+
+- A constant integer, called a **constant discriminator**. A pointer signed
with a non-zero constant discriminator is said to have **constant diversity**.
If the discriminator is specific to a single declaration, it is said to have
**declaration diversity**; if the discriminator is specific to a type of value,
it is said to have **type diversity**. For example, C++ v-tables on arm64e
sign their component functions using a hash of their method names and
signatures, which provides declaration diversity; similarly, C++ member
function pointers sign their invocation functions using a hash of the member
pointer type, which provides type diversity.
+
+The implementation may need to restrict constant discriminators to be
significantly smaller than the full size of a discriminator. For example, on
arm64e, constant discriminators are only 16-bit values. This is believed to
not significantly weaken the mitigation, since collisions remain uncommon.
+
+The algorithm for blending a constant discriminator with a storage address is
implementation-defined.
+
+.. _Signing schemas:
+
+Signing schemas
+~~~~~~~~~~~~~~~
+
+Correct use of pointer authentication requires the signing code and the
authenticating code to agree about the **signing schema** for the pointer:
+
+- the abstract signing key with which the pointer should be signed and
+- an algorithm for computing the discriminator.
+
+As described in the section above on `Discriminators`_, in most situations,
the discriminator is produced by taking a constant discriminator and optionally
blending it with the storage address of the pointer. In these situations, the
signing schema breaks down even more simply:
+
+- the abstract signing key,
+- a constant discriminator, and
+- whether to use address diversity.
+
+It is important that the signing schema be independently derived at all
signing and authentication sites. Preferably, the schema should be hard-coded
everywhere it is needed, but at the very least, it must not be derived by
inspecting information stored along with the pointer. See the section on
`Attacks on pointer authentication`_ for more information.
+
+Language Features
+-----------------
+
+There is currently one main pointer authentication language feature:
+
+- The language provides the ``<ptrauth.h>`` intrinsic interface for manually
signing and authenticating pointers in code. These can be used in
circumstances where very specific behavior is required.
+
+
+Language extensions
+~~~~~~~~~~~~~~~~~~~
+
+Feature testing
+^^^^^^^^^^^^^^^
+
+Whether the current target uses pointer authentication can be tested for with
a number of different tests.
+
+- ``__has_feature(ptrauth_intrinsics)`` is true if ``<ptrauth.h>`` provides
its normal interface. This may be true even on targets where pointer
authentication is not enabled by default.
+
+``<ptrauth.h>``
+~~~~~~~~~~~~~~~
+
+This header defines the following types and operations:
+
+``ptrauth_key``
+^^^^^^^^^^^^^^^
+
+This ``enum`` is the type of abstract signing keys. In addition to defining
the set of implementation-specific signing keys (for example, ARMv8.3 defines
``ptrauth_key_asia``), it also defines some portable aliases for those keys.
For example, ``ptrauth_key_function_pointer`` is the key generally used for C
function pointers, which will generally be suitable for other function-signing
schemas.
+
+In all the operation descriptions below, key values must be constant values
corresponding to one of the implementation-specific abstract signing keys from
this ``enum``.
+
+``ptrauth_extra_data_t``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a ``typedef`` of a standard integer type of the correct size to hold a
discriminator value.
+
+In the signing and authentication operation descriptions below, discriminator
values must have either pointer type or integer type. If the discriminator is
an integer, it will be coerced to ``ptrauth_extra_data_t``.
+
+``ptrauth_blend_discriminator``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+ ptrauth_blend_discriminator(pointer, integer)
+
+Produce a discriminator value which blends information from the given pointer
and the given integer.
+
+Implementations may ignore some bits from each value, which is to say, the
blending algorithm may be chosen for speed and convenience over theoretical
strength as a hash-combining algorithm. For example, arm64e simply overwrites
the high 16 bits of the pointer with the low 16 bits of the integer, which can
be done in a single instruction with an immediate integer.
+
+``pointer`` must have pointer type, and ``integer`` must have integer type.
The result has type ``ptrauth_extra_data_t``.
+
+``ptrauth_strip``
+^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+ ptrauth_strip(signedPointer, key)
+
+Given that ``signedPointer`` matches the layout for signed pointers signed
with the given key, extract the raw pointer from it. This operation does not
trap and cannot fail, even if the pointer is not validly signed.
+
+``ptrauth_sign_unauthenticated``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+ ptrauth_sign_unauthenticated(pointer, key, discriminator)
+
+Produce a signed pointer for the given raw pointer without applying any
authentication or extra treatment. This operation is not required to have the
same behavior on a null pointer that the language implementation would.
+
+This is a treacherous operation that can easily result in `signing oracles`_.
Programs should use it seldom and carefully.
+
+``ptrauth_auth_and_resign``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+ ptrauth_auth_and_resign(pointer, oldKey, oldDiscriminator, newKey,
newDiscriminator)
+
+Authenticate that ``pointer`` is signed with ``oldKey`` and
``oldDiscriminator`` and then resign the raw-pointer result of that
authentication with ``newKey`` and ``newDiscriminator``.
+
+``pointer`` must have pointer type. The result will have the same type as
``pointer``. This operation is not required to have the same behavior on a
null pointer that the language implementation would.
+
+The code sequence produced for this operation must not be directly attackable.
However, if the discriminator values are not constant integers, their
computations may still be attackable. In the future, Clang should be enhanced
to guaranteed non-attackability if these expressions are
:ref:`safely-derived<Safe derivation>`.
+
+``ptrauth_auth_data``
+^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+ ptrauth_auth_data(pointer, key, discriminator)
+
+Authenticate that ``pointer`` is signed with ``key`` and ``discriminator`` and
remove the signature.
+
+``pointer`` must have object pointer type. The result will have the same type
as ``pointer``. This operation is not required to have the same behavior on a
null pointer that the language implementation would.
+
+In the future when Clang makes `safe derivation`_ guarantees, the result of
this operation should be considered safely-derived.
+
+``ptrauth_sign_generic_data``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c
+
+ ptrauth_sign_generic_data(value1, value2)
+
+Computes a signature for the given pair of values, incorporating a secret
signing key.
+
+This operation can be used to verify that arbitrary data has not be tampered
with by computing a signature for the data, storing that signature, and then
repeating this process and verifying that it yields the same result. This can
be reasonably done in any number of ways; for example, a library could compute
an ordinary checksum of the data and just sign the result in order to get the
tamper-resistance advantages of the secret signing key (since otherwise an
attacker could reliably overwrite both the data and the checksum).
+
+``value1`` and ``value2`` must be either pointers or integers. If the
integers are larger than ``uintptr_t`` then data not representable in
``uintptr_t`` may be discarded.
+
+The result will have type ``ptrauth_generic_signature_t``, which is an integer
type. Implementations are not required to make all bits of the result equally
significant; in particular, some implementations are known to not leave
meaningful data in the low bits.
+
+
+
+
+
+Theory of Operation
+-------------------
+
+The threat model of pointer authentication is as follows:
+
+- The attacker has the ability to read and write to a certain range of
addresses, possibly the entire address space. However, they are constrained by
the normal rules of the process: for example, they cannot write to memory that
is mapped read-only, and if they access unmapped memory it will trigger a trap.
+
+- The attacker has no ability to add arbitrary executable code to the program.
For example, the program does not include malicious code to begin with, and
the attacker cannot alter existing instructions, load a malicious shared
library, or remap writable pages as executable. If the attacker wants to get
the process to perform a specific sequence of actions, they must somehow
subvert the normal control flow of the process.
+
+In both of the above paragraphs, it is merely assumed that the attacker's
*current* capabilities are restricted; that is, their current exploit does not
directly give them the power to do these things. The attacker's immediate goal
may well be to leverage their exploit to gain these capabilities, e.g. to load
a malicious dynamic library into the process, even though the process does not
directly contain code to do so.
+
+Note that any bug that fits the above threat model can be immediately
exploited as a denial-of-service attack by simply performing an illegal access
and crashing the program. Pointer authentication cannot protect against this.
While denial-of-service attacks are unfortunate, they are also unquestionably
the best possible result of a bug this severe. Therefore, pointer
authentication enthusiastically embraces the idea of halting the program on a
pointer authentication failure rather than continuing in a possibly-compromised
state.
+
+Pointer authentication is a form of control-flow integrity (CFI) enforcement.
The basic security hypothesis behind CFI enforcement is that many bugs can only
be usefully exploited (other than as a denial-of-service) by leveraging them to
subvert the control flow of the program. If this is true, then by inhibiting
or limiting that subversion, it may be possible to largely mitigate the
security consequences of those bugs by rendering them impractical (or, ideally,
impossible) to exploit.
+
+Every indirect branch in a program has a purpose. Using human intelligence, a
programmer can describe where a particular branch *should* go according to this
purpose: a ``return`` in ``printf`` should return to the call site, a
particular call in ``qsort`` should call the comparator that was passed in as
an argument, and so on. But for CFI to enforce that every branch in a program
goes where it *should* in this sense would require CFI to perfectly enforce
every semantic rule of the program's abstract machine; that is, it would
require making the programming environment perfectly sound. That is out of
scope. Instead, the goal of CFI is merely to catch attempts to make a branch
go somewhere that its obviously *shouldn't* for its purpose: for example, to
stop a call from branching into the middle of a function rather than its
beginning. As the information available to CFI gets better about the purpose
of the branch, CFI can enforce tighter and tighter restrictions on where the
branch is permitted to go. Still, ultimately CFI cannot make the program
sound. This may help explain why pointer authentication makes some of the
choices it does: for example, to sign and authenticate mostly code pointers
rather than every pointer in the program. Preventing attackers from
redirecting branches is both particularly important and particularly
approachable as a goal. Detecting corruption more broadly is infeasible with
these techniques, and the attempt would have far higher cost.
+
+Attacks on pointer authentication
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Pointer authentication works as follows. Every indirect branch in a program
has a purpose. For every purpose, the implementation chooses a :ref:`signing
schema<Signing schemas>`. At some place where a pointer is known to be correct
for its purpose, it is signed according to the purpose's schema. At every
place where the pointer is needed for its purpose, it is authenticated
according to the purpose's schema. If that authentication fails, the program
is halted.
+
+There are a variety of ways to attack this.
+
+Attacks of interest to programmers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These attacks arise from weaknesses in the default protections offered by
pointer authentication. They can be addressed by using attributes or
intrinsics to opt in to stronger protection.
+
+Substitution attacks
+++++++++++++++++++++
+
+An attacker can simply overwrite a pointer intended for one purpose with a
pointer intended for another purpose if both purposes use the same signing
schema and that schema does not use address diversity.
+
+The most common source of this weakness is when code relies on using the
default language rules for C function pointers. The current implementation
uses the exact same signing schema for all C function pointers, even for
functions of substantially different type. While efforts are ongoing to
improve constant diversity for C function pointers of different type, there are
necessary limits to this. The C standard requires function pointers to be
copyable with ``memcpy``, which means that function pointers can never use
address diversity. Furthermore, even if a function pointer can only be
replaced with another function of the exact same type, that can still be useful
to an attacker, as in the following example of a hand-rolled "v-table":
+
+.. code-block:: c
+
+ struct ObjectOperations {
+ void (*retain)(Object *);
+ void (*release)(Object *);
+ void (*deallocate)(Object *);
+ void (*logStatus)(Object *);
+ };
+
+This weakness can be mitigated by using a more specific signing schema for
each purpose. For example, in this example, the ``__ptrauth`` qualifier can be
used with a different constant discriminator for each field. Since there's no
particular reason it's important for this v-table to be copyable with
``memcpy``, the functions can also be signed with address diversity:
+
+.. code-block:: c
+
+ #if __has_feature(ptrauth_calls)
+ #define objectOperation(discriminator) \
+ __ptrauth(ptrauth_key_function_pointer, 1, discriminator)
+ #else
+ #define objectOperation(discriminator)
+ #endif
+
+ struct ObjectOperations {
+ void (*objectOperation(0xf017) retain)(Object *);
+ void (*objectOperation(0x2639) release)(Object *);
+ void (*objectOperation(0x8bb0) deallocate)(Object *);
+ void (*objectOperation(0xc5d4) logStatus)(Object *);
+ };
+
+This weakness can also sometimes be mitigated by simply keeping the signed
pointer in constant memory, but this is less effective than using better
signing diversity.
+
+.. _Access path attacks:
+
+Access path attacks
++++++++++++++++++++
+
+If a signed pointer is often accessed indirectly (that is, by first loading
the address of the object where the signed pointer is stored), an attacker can
affect uses of it by overwriting the intermediate pointer in the access path.
+
+The most common scenario exhibiting this weakness is an object with a pointer
to a "v-table" (a structure holding many function pointers). An attacker does
not need to replace a signed function pointer in the v-table if they can
instead simply replace the v-table pointer in the object with their own pointer
--- perhaps to memory where they've constructed their own v-table, or to
existing memory that coincidentally happens to contain a signed pointer at the
right offset that's been signed with the right signing schema.
+
+This attack arises because data pointers are not signed by default. It works
even if the signed pointer uses address diversity: address diversity merely
means that each pointer is signed with its own storage address, which (by
design) is invariant to changes in the accessing pointer.
+
+Using sufficiently diverse signing schemas within the v-table can provide
reasonably strong mitigation against this weakness. Always use address
diversity in v-tables to prevent attackers from assembling their own v-table.
Avoid re-using constant discriminators to prevent attackers from replacing a
v-table pointer with a pointer to totally unrelated memory that just happens to
contain an similarly-signed pointer.
+
+Further mitigation can be attained by signing pointers to v-tables. Any
signature at all should prevent attackers from forging v-table pointers; they
will need to somehow harvest an existing signed pointer from elsewhere in
memory. Using a meaningful constant discriminator will force this to be
harvested from an object with similar structure (e.g. a different
implementation of the same interface). Using address diversity will prevent
such harvesting entirely. However, care must be taken when sourcing the
v-table pointer originally; do not blindly sign a pointer that is not
:ref:`safely derived<Safe derivation>`.
+
+.. _Signing oracles:
+
+Signing oracles
++++++++++++++++
+
+A signing oracle is a bit of code which can be exploited by an attacker to
sign an arbitrary pointer in a way that can later be recovered. Such oracles
can be used by attackers to forge signatures matching the oracle's signing
schema, which is likely to cause a total compromise of pointer authentication's
effectiveness.
----------------
DavidSpickett wrote:
"sequence of code" might be a bit (no pun intended) clearer here, though it's
not actually more specific, it's more formal at least. And bit has many
meanings in this area so maybe worth avoiding.
(there's one in the section below too if you choose to change this)
https://github.com/llvm/llvm-project/pull/65996
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits