================ @@ -0,0 +1,255 @@ +============================================ +Implementation plans for ``-fbounds-safety`` +============================================ + +.. contents:: + :local: + +External bounds annotations +=========================== + +The bounds annotations are C type attributes appertaining to pointer types. If +an attribute is added to the position of a declaration attribute, e.g., ``int +*ptr __counted_by(size)``, the attribute appertains to the outermost pointer +type of the declaration (``int *``). + +New sugar types +=============== + +An external bounds annotation creates a type sugar of the underlying pointer +types. We will introduce a new sugar type, ``DynamicBoundsPointerType`` to +represent ``__counted_by`` or ``__sized_by``. Using ``AttributedType`` would not +be sufficient because the type needs to hold the count or size expression as +well as some metadata necessary for analysis, while this type may be implemented +through inheritance from ``AttributedType``. Treating the annotations as type +sugars means two types with incompatible external bounds annotations may be +considered canonically the same types. This is sometimes necessary, for example, +to make the ``__counted_by`` and friends not participate in function +overloading. However, this design requires a separate logic to walk through the +entire type hierarchy to check type compatibility of bounds annotations. + +Late parsing for C +================== + +A bounds annotation such as ``__counted_by(count)`` can be added to type of a +struct field declaration where count is another field of the same struct +declared later. Similarly, the annotation may apply to type of a function +parameter declaration which precedes the parameter count in the same function. +This means parsing the argument of bounds annotations must be done after the +parser has the whole context of a struct or a function declaration. Clang has +late parsing logic for C++ declaration attributes that require late parsing, +while the C declaration attributes and C/C++ type attributes do not have the +same logic. This requires introducing late parsing logic for C/C++ type +attributes. + +Internal bounds annotations +=========================== + +``__indexable`` and ``__bidi_indexable`` alter pointer representations to be +equivalent to a struct with the pointer and the corresponding bounds fields. +Despite this difference in their representations, they are still pointers in +terms of types of operations that are allowed and their semantics. For instance, +a pointer dereference on a ``__bidi_indexable`` pointer will return the +dereferenced value same as plain C pointers, modulo the extra bounds checks +being performed before dereferencing the wide pointer. This means mapping the +wide pointers to struct types with equivalent layout won’t be sufficient. To +represent the wide pointers in Clang AST, we add an extra field in the +PointerType class to indicate the internal bounds of the pointer. This ensures +pointers of different representations are mapped to different canonical types +while they are still treated as pointers. + +In LLVM IR, wide pointers will be emitted as structs of equivalent +representations. Clang CodeGen will handle them as Aggregate in +``TypeEvaluationKind (TEK)``. ``AggExprEmitter`` was extended to handle pointer +operations returning wide pointers. Alternatively, a new ``TEK`` and an +expression emitter dedicated to wide pointers could be introduced. + +Default bounds annotations +========================== + +The model may implicitly add ``__bidi_indexable`` or ``__single`` depending on +the context of the declaration that has the pointer type. ``__bidi_indexable`` +implicitly adds to local variables, while ``__single`` implicitly adds to +pointer types specifying struct fields, function parameters, or global +variables. This means the parser may first create the pointer type without any +default pointer attribute and then recreate the type once the parser has the +declaration context and determined the default attribute accordingly. + +This also requires the parser to reset the type of the declaration with the +newly created type with the right default attribute. + +Promotion expression +==================== + +A new expression will be introduced to represent the conversion from a pointer +with an external bounds annotation, such as ``__counted_by``, to +``__bidi_indexable``. This type of conversion cannot be handled by normal +CastExprs because it requires an extra subexpression(s) to provide the bounds +information necessary to create a wide pointer. + +Bounds check expression +======================= + +Bounds checks are part of semantics defined in the ``-fbounds-safety`` language +model. Hence, exposing the bounds checks and other semantic actions in the AST +is desirable. A new expression for bounds checks has been added to the AST. The +bounds check expression has a ``BoundsCheckKind`` to indicate the kind of checks +and has the additional sub-expressions that are necessary to perform the check +according to the kind. + +Paired assignment check +======================= + +``-fbounds-safety`` enforces that variables or fields related with the same +external bounds annotation (e.g., ``buf`` and ``count`` related with +``__counted_by`` in the example below) must be updated side by side within the +same basic block and without side effect in between. + +.. code-block:: c + + typedef struct { + int *__counted_by(count) buf; size_t count; + } sized_buf_t; + + void alloc_buf(sized_buf_t *sbuf, sized_t nelems) { + sbuf->buf = (int *)malloc(sizeof(int) * nelems); + sbuf->count = nelems; + } + +To implement this rule, the compiler requires a linear representation of +statements to understand the ordering and the adjacency between the two or more +assignments. The Clang CFG is used to implement this analysis as Clang CFG +provides a linear view of statements within each ``CFGBlock`` (Clang +``CFGBlock`` represents a single basic block in a source-level CFG). + +Bounds check optimizations +========================== + +In ``-fbounds-safety``, the Clang frontend emits run-time checks for every +memory dereference if the type system or analyses in the frontend couldn’t +verify its bounds safety. The implementation relies on LLVM optimizations to +remove redundant run-time checks. Using this optimization strategy, if the +original source code already has bounds checks, the fewer additional checks +``-fbounds-safety`` will introduce. The LLVM ``ConstraintElimination`` pass is +design to remove provable redundant checks (please check Florian Hahn’s +presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In +the following example, ``-fbounds-safety`` implicitly adds the redundant bounds +checks that the optimizer can remove: + +.. code-block:: c + + void fill_array_with_indices(int *__counted_by(count) p, size_t count) { + for (size_t i = 0; i < count; ++i) { + // implicit bounds checks: + // if (p + i < p || p + i + 1 > p + count) trap(); + p[i] = i; + } + } + +``ConstraintElimination`` collects the following facts and determines if the +bounds checks can be safely removed: + +* Inside the for-loop, ``0 <= i < count``, hence ``1 <= i + 1 <= count``. +* Pointer arithmetic ``p + count`` in the if-condition doesn’t wrap. +* ``-fbounds-safety`` treats pointer arithmetic overflow as deterministically + two’s complement computation, not an undefined behavior. Therefore, + getelementptr does not typically have inbounds keyword. However, the compiler + does emit inbounds for ``p + count`` in this case because + ``__counted_by(count)`` has the invariant that p has at least as many as + elements as count. Using this information, ``ConstraintElimination`` is able + to determine ``p + count`` doesn’t wrap. +* Accordingly, ``p + i`` and ``p + i + 1`` also don’t wrap. +* Therefore, ``p <= p + i`` and ``p + i + 1 <= p + count``. +* The if-condition simplifies to false and becomes dead code that the subsequent + optimization passes can remove. + +``OptRemarks`` can be utilized to provide insights into performance tuning. It +has the capability to report on checks that it cannot eliminate, possibly with +reasons, allowing programmers to adjust their code to unlock further +optimizations. + +Debugging +========= + +Internal bounds annotations +--------------------------- + +Internal bounds annotations change a pointer into a wide pointer. The debugger +needs to understand that wide pointers are essentially pointers with a struct +layout. To handle this, a wide pointer is described as a record type in the +debug info. The type name has a special name prefix (e.g., +``__bounds_safety$bidi_indexable``) which can be recognized by a debug info +consumer to provide support that goes beyond showing the internal structure of +the wide pointer. There are no DWARF extensions needed to support wide pointers. +In our implementation, LLDB recognized wide pointer types by name and ---------------- adrian-prantl wrote:
```suggestion In our implementation, LLDB recognizes wide pointer types by name and ``` https://github.com/llvm/llvm-project/pull/70749 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits