Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Qing Zhao Mon, 29 Jan 2024 08:20:00 -0800


> On Jan 29, 2024, at 10:50 AM, Martin Uecker <[email protected]> wrote:
> 
> Am Montag, dem 29.01.2024 um 15:09 +0000 schrieb Qing Zhao:
>> Thank you!
>> 
>> Joseph and Richard,  could you also comment on this?
>> 
>>> On Jan 28, 2024, at 5:09 AM, Martin Uecker <[email protected]> wrote:
>>> 
>>> Am Freitag, dem 26.01.2024 um 14:33 +0000 schrieb Qing Zhao:
>>>> 
>>>>> On Jan 26, 2024, at 3:04 AM, Martin Uecker <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> I haven't looked at the patch, but it sounds you give the result
>>>>> the wrong type. Then patching up all use cases instead of the
>>>>> type seems wrong.
>>>> 
>>>> Yes, this is for resolving a very early gimplification issue as I reported 
>>>> last Nov:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
>>>> 
>>>> Since no-one responded at that time, I fixed the issue by replacing the 
>>>> ARRAY_REF
>>>> With a pointer indirection:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
>>>> 
>>>> The reason for such change is:  return a flexible array member TYPE is not 
>>>> allowed
>>>> by C language (our gimplification follows this rule), so, we have to 
>>>> return a pointer TYPE instead. 
>>>> 
>>>> ******The new internal function
>>>> 
>>>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SIZE, 
>>>> ACCESS_MODE, INDEX)
>>>> 
>>>> INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
>>>> 
>>>> which returns the "REF_TO_OBJ" same as the 1st argument;
>>>> 
>>>> Both the return type and the type of the first argument of this function 
>>>> have been converted from 
>>>> the incomplete array type to the corresponding pointer type.
>>>> 
>>>> As a result, the original ARRAY_REF was converted to an INDIRECT_REF, the 
>>>> original INDEX of the ARRAY_REF was lost
>>>> when converting from ARRAY_REF to INDIRECT_REF, in order to keep the INDEX 
>>>> for bound sanitizer instrumentation, I added
>>>> The 6th argument “INDEX”.
>>>> 
>>>> What’s your comment and suggestion on this solution?
>>> 
>>> I am not entirely sure but changing types in the FE seems
>>> problematic because this breaks language semantics. And
>>> then adding special code everywhere to treat it specially
>>> in the FE does not seem a good way forward.
>>> 
>>> If I understand correctly, returning an incomplete array 
>>> type is not allowed and then fails during gimplification.
>> 
>> Yes, this is the problem in gimplification. 
>> 
>>> So I would suggest to make it return a pointer to the 
>>> incomplete array (and not the element type)
>> 
>> 
>> for the following:
>> 
>> struct annotated {
>>  unsigned int size;
>>  int array[] __attribute__((counted_by (size)));
>> };
>> 
>>  struct annotated * p = ….
>>  p->array[9] = 0;
>> 
>> The IL for the above array reference p->array[9] is:
>> 
>> 1. If the return type is the original incomplete array type, 
>> 
>> .ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] = 0;
>> 
>> (this triggered the gimplification failure since the return type cannot be a 
>> complete type).
>> 
>> 2. When the return type is changed to a pointer to the element type of the 
>> incomplete array, (the current patch)
>> Then the original array reference naturally becomes an indirect reference 
>> through the pointer
>> 
>> *(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1, 9) + 36) = 0;
>> 
>> Since the original array reference becomes an indirect reference through the 
>> pointer to the element array, the INDEX info 
>> is mixed into the OFFSET of the indirect reference and lost, so, I added the 
>> 6th argument to the routine .ACCESS_WITH_SIZE
>> to record the INDEX. 
>> 
>> 3. With your suggestion, the return type is changed to a pointer to the 
>> incomplete array, 
>> I just tried this to change the result type :
>> 
>> 
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
>> tree ref,
>>                                       tree counted_by_type)
>> {
>>   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
>> -  tree result_type = build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
>> +  tree result_type = build_pointer_type (TREE_TYPE (ref));
>> 
>> Then, I got the following FE errors:
>> 
>> test.c:10:11: error: invalid use of flexible array member
>>   10 |   p->array[9] = 0;
>> 
>> The reason for the error is: when the original array_ref becomes an 
>> indirect_ref through the pointer to the incomplete array,
>> During the computation of the OFFSET to the pointer, the TYPE_SIZE_UNIT 
>> (type) is invalid since the type is an incomplete array. 
>> As a result, the OFFSET cannot computed for the indirect_ref.
>> 
>> Looks like even more issues with this approach.
> 
> Yes, but only because the following is missing:
> 
>> 
>> 
>>> but then wrap
>>> it with an indirection when inserting this code in the FE
>>> so that the full replacement has the correct type again
>>> (of the incomplete array).
>> 
>> I don’t quite understand the above, could you please explain this in more 
>> details? (If possible, could you please use the above small example?)
>> thanks.
> 
> You would need to add an indirection:
> 
> (*(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)))[9] = 0;
> 
> if .ACCESS_WITH_SIZE has type    T (*)[], i.e. pointer to incomplete
> array of type T, then (*(.ACCESS_WITH_SIZE (...))) has type T[], i.e.
> incomplete array of type.   
> 
> And you shouldn't even consider array derefencing part at all,
> but replace the p->array when the component ref is constructed.


Thanks, I see now.

 I just updated the routine “build_access_with_size_for_counted_by” as 
following:
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
                                       tree counted_by_type)
 {
   gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
-  tree result_type = build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
+  tree result_type = build_pointer_type (TREE_TYPE (ref));
   unsigned int counted_by_precision = TYPE_PRECISION (counted_by_type);
     tree call
@@ -2632,6 +2632,7 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
                                                 counted_by_precision),
                                  build_int_cst (integer_type_node, -1),
                                  build_int_cst (integer_type_node, -1));
+  call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
   return call;
 }

This works for the small testing case, the generated IR is:

  (*.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1, -1))[9] = 0;

I will do more testings and adjust my patch on this change.

Thanks a lot for your help.

Qing

> 
> Martin
> 
> 
> 
>> 
>>> 
>>> 
>>> Alternatively, one could allow this during gimplification
>>> or add some conversion.
>> 
>> Allowing this in gimplification might trigger some other issues.  I guess 
>> that adding conversion 
>> in the end of the FE or in the beginning of gimplification might be better.
>> 
>> i.e,  in FE, still keep the original incomplete array type as the return 
>> type for the routine .ACCESS_WITH_SIZE, i.e
>> 
>> .ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] = 0;
>> 
>> But add a conversion from the above array_ref to an indirect_ref in the end 
>> of FE or in the beginning of gimplification:
>> 
>> *(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1) + 36) = 0;
>> 
>> With this approach,  during FE, the original ARRAY TYPE and the INDEX can be 
>> kept, the array bound sanitizer instrumentation
>> Will be much easier than my current approach. 
>> 
>> Is this approach reasonable?
>> 
>> If so, where is better to add this conversion, in the end of FE or in the 
>> beginning of gimplification?
>> 
>> Thanks.
>> 
>> Qing
>> 
>> 
>>> 
>>> Martin
>>> 
>>> 
>>>> 
>>>> Thanks.
>>>> 
>>>> Qing
>>>> 
>>>> 
>>>>> 
>>>>> Martin
>>>>> 
>>>>> 
>>>>> Am Donnerstag, dem 25.01.2024 um 20:11 +0000 schrieb Qing Zhao:
>>>>>> Thanks a lot for the testing.
>>>>>> 
>>>>>> Yes, I can repeat the issue with the following small example:
>>>>>> 
>>>>>> #include <stdlib.h>
>>>>>> #include <stddef.h>
>>>>>> #include <stdio.h>
>>>>>> 
>>>>>> #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
>>>>>> 
>>>>>> struct untracked {
>>>>>>     int size;
>>>>>>     int array[] __attribute__((counted_by (size)));
>>>>>> } *a;
>>>>>> struct untracked * alloc_buf (int index)
>>>>>> {
>>>>>> struct untracked *p;
>>>>>> p = (struct untracked *) malloc (MAX (sizeof (struct untracked),
>>>>>>                                      (offsetof (struct untracked, 
>>>>>> array[0])
>>>>>>                                       + (index) * sizeof (int))));
>>>>>> p->size = index;
>>>>>> return p;
>>>>>> }
>>>>>> 
>>>>>> int main()
>>>>>> {
>>>>>> a = alloc_buf(10);
>>>>>>   printf ("same_type is %d\n",
>>>>>> (__builtin_types_compatible_p(typeof (a->array), typeof 
>>>>>> (&(a->array)[0]))));
>>>>>> return 0;
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
>>>>>> same_type is 1
>>>>>> 
>>>>>> Looks like that the “typeof” operator need to be handled specially in C 
>>>>>> FE
>>>>>> for the new internal function .ACCESS_WITH_SIZE. 
>>>>>> 
>>>>>> (I have specially handle the operator “offsetof” in C FE already).
>>>>>> 
>>>>>> Will fix this issue.
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> Qing
>>>>>> 
>>>>>>> On Jan 24, 2024, at 7:51 PM, Kees Cook <[email protected]> wrote:
>>>>>>> 
>>>>>>> On Wed, Jan 24, 2024 at 12:29:51AM +0000, Qing Zhao wrote:
>>>>>>>> This is the 4th version of the patch.
>>>>>>> 
>>>>>>> Thanks very much for this!
>>>>>>> 
>>>>>>> I tripped over an unexpected behavioral change that the Linux kernel
>>>>>>> depends on:
>>>>>>> 
>>>>>>> __builtin_types_compatible_p() no longer treats an array marked with
>>>>>>> counted_by as different from that array's decayed pointer. Specifically,
>>>>>>> the kernel uses these macros:
>>>>>>> 
>>>>>>> 
>>>>>>> /*
>>>>>>> * Force a compilation error if condition is true, but also produce a
>>>>>>> * result (of value 0 and type int), so the expression can be used
>>>>>>> * e.g. in a structure initializer (or where-ever else comma expressions
>>>>>>> * aren't permitted).
>>>>>>> */
>>>>>>> #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
>>>>>>> 
>>>>>>> #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), 
>>>>>>> typeof(b))
>>>>>>> 
>>>>>>> /* &a[0] degrades to a pointer: a different type from an array */
>>>>>>> #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a), 
>>>>>>> &(a)[0]))
>>>>>>> 
>>>>>>> 
>>>>>>> This gets used in various places to make sure we're dealing with an
>>>>>>> array for a macro:
>>>>>>> 
>>>>>>> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + 
>>>>>>> __must_be_array(arr))
>>>>>>> 
>>>>>>> 
>>>>>>> So this builds:
>>>>>>> 
>>>>>>> struct untracked {
>>>>>>>     int size;
>>>>>>>     int array[];
>>>>>>> } *a;
>>>>>>> 
>>>>>>> __must_be_array(a->array)
>>>>>>> => 0 (as expected)
>>>>>>> __builtin_types_compatible_p(typeof(a->array), typeof(&(a->array)[0]))
>>>>>>> => 0 (as expected, array vs decayed array pointer)
>>>>>>> 
>>>>>>> 
>>>>>>> But if counted_by is added, we get a build failure:
>>>>>>> 
>>>>>>> struct tracked {
>>>>>>>     int size;
>>>>>>>     int array[] __counted_by(size);
>>>>>>> } *b;
>>>>>>> 
>>>>>>> __must_be_array(b->array)
>>>>>>> => build failure (not expected)
>>>>>>> __builtin_types_compatible_p(typeof(b->array), typeof(&(b->array)[0]))
>>>>>>> => 1 (not expected, both pointers?)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Kees Cook
>> 
>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging

Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

Reply via email to