Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

Qing Zhao via Gcc-patches Tue, 18 Jul 2023 11:54:21 -0700


> On Jul 18, 2023, at 11:37 AM, Qing Zhao via Gcc-patches 
> <[email protected]> wrote:
> 
> 
> 
>> On Jul 17, 2023, at 7:40 PM, Kees Cook <[email protected]> wrote:
>> 
>> On Mon, Jul 17, 2023 at 09:17:48PM +0000, Qing Zhao wrote:
>>> 
>>>> On Jul 13, 2023, at 4:31 PM, Kees Cook <[email protected]> wrote:
>>>> 
>>>> In the bug, the problem is that "p" isn't known to be allocated, if I'm
>>>> reading that correctly?
>>> 
>>> I think that the major point in PR109557 
>>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557):
>>> 
>>> for the following pointer p.3_1, 
>>> 
>>> p.3_1 = p;
>>> _2 = __builtin_object_size (p.3_1, 0);
>>> 
>>> Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of 
>>> p when the TYPE_SIZE can be determined at compile time?
>>> 
>>> Answer:  From just knowing the type of the pointee of p, the compiler 
>>> cannot determine the size of the object.  
>> 
>> Why is that? "p" points to "struct P", which has a fixed size. There
>> must be an assumption somewhere that a pointer is allocated, otherwise
>> __bos would almost never work?
> 
> My understanding from the comments in PR109557 was: 
> 
> In general the pointer could point to the first object of an array that has 
> more elements, or to an object of a different type. 
> Without seeing the real allocation to the pointer, the compiler cannot assume 
> that the pointer point to an object that has
> the exact same type as its declaration. 
> 
> Sid and Martin, is the above understand correctly?
> 
> Honestly, I am still not 100% clear on this yet.
> 
> Jakub, Sid and Martin, could  you please explain a little bit more here, 
> especially for the following small example?
> 
> [opc@qinzhao-ol8u3-x86 109557]$ cat t.c
> #include <stdlib.h>
> #include <assert.h>
> struct P {
>  int k;
>  int x[10]; 
> } *p;
> 
> void store(int a, int b) 
> {
>  p = (struct P *)malloc (sizeof (struct P));
>  p->k = a;
>  p->x[b] = 0;
>  assert (__builtin_dynamic_object_size (p->x, 1) == sizeof (int[10]));
>  assert (__builtin_dynamic_object_size (p, 1) == sizeof (struct P));
>  return;
> }
> 
> int main()
> {
>  store(7, 7);
>  assert (__builtin_dynamic_object_size (p->x, 1) == sizeof (int[10]));
>  assert (__builtin_dynamic_object_size (p, 1) == sizeof (struct P));
>  free (p);
> }
> [opc@qinzhao-ol8u3-x86 109557]$ sh t
> /home/opc/Install/latest/bin/gcc -O -fdump-tree-all t.c
> a.out: t.c:21: main: Assertion `__builtin_dynamic_object_size (p->x, 1) == 
> sizeof (int[10])' failed.
> t: line 19: 859944 Aborted                 (core dumped) ./a.out
> 
>


A correction to the above compilation option:
/home/opc/Install/latest-d/bin/gcc -O -fstrict-flex-arrays=3 t.c
a.out: t.c:22: main: Assertion `__builtin_dynamic_object_size (p, 1) == sizeof 
(struct P)' failed.
t: line 19: 918833 Aborted                 (core dumped) ./a.out

(All others keep the same).

Sorry for the mistake.

Qing

> In the above, among the 4 assertions, only the last one failed.
> 
> Why GCC can use the TYPE_SIZE of the pointee of the pointer “p->x” as the 
> size of the object, 
> but cannot use the TYPE_SIZE of the pointee of the pointer “p” as the size of 
> the object? 
> 
> 
>> 
>>> Therefore the bug has been closed. 
>>> 
>>> In your following testing 5:
>>> 
>>>> I'm not sure this is a reasonable behavior, but
>>>> let me get into the specific test, which looks like this:
>>>> 
>>>> TEST(counted_by_seen_by_bdos)
>>>> {
>>>>      struct annotated *p;
>>>>      int index = MAX_INDEX + unconst;
>>>> 
>>>>      p = alloc_annotated(index);
>>>> 
>>>>      REPORT_SIZE(p->array);
>>>> /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array));
>>>>      /* Check array size alone. */
>>>> /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX);
>>>> /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * 
>>>> sizeof(*p->array));
>>>>      /* Check check entire object size. */
>>>> /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX);
>>>> /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo 
>>>> * sizeof(*p->array));
>>>> }
>>>> 
>>>> Test 5 should pass as well, since, again, p can be examined. Passing p
>>>> to __bdos implies it is allocated and the __counted_by annotation can be
>>>> examined.
>>> 
>>> Since the call to the routine “alloc_annotated" cannot be inlined, GCC does 
>>> not see any allocation calls for the pointer p.
>> 
>> Correct.
>> 
>>> At the same time, due to the same reason as PR109986, GCC cannot determine 
>>> the size of the object by just knowing the TYPE_SIZE
>>> of the pointee of p.  
>> 
>> So the difference between test 3 and test 5 is that "p" is explicitly
>> dereferenced to find "array", and therefore an assumption is established
>> that "p" is allocated?
> 
> Yes, this might be the assumption that GCC made  -:)
>> 
>>> So, this is exactly the same issue as PR109557.  It’s an existing behavior 
>>> per the current __buitlin_object_size algorithm.
>>> I am still not very sure whether the situation in PR109557 can be improved 
>>> or not, but either way, it’s a separate issue.
>> 
>> Okay, so the issue is one of object allocation visibility (or
>> assumptions there in)?
> 
> Might be, let’s see what Sid or Martin will say on this.
>> 
>>> Please see the new testing case I added for PR109557, you will see that the 
>>> above case 5 is a similar case as the new testing case in PR109557.
>> 
>> I will ponder this a bit more to see if I can come up with a useful
>> test case to replace the part from "test 5" above.
>> 
>>>> 
>>>> If "return p->array[index];" would be caught by the sanitizer, then
>>>> it follows that __builtin_dynamic_object_size(p, 1) must also know the
>>>> size. i.e. both must examine "p" and its trailing flexible array and
>>>> __counted_by annotation.
>>>> 
>>>>> 
>>>>> 2. The common issue for  the failed testing 3, 4, 9, 10 is:
>>>>> 
>>>>> for the following annotated structure: 
>>>>> 
>>>>> ====
>>>>> struct annotated {
>>>>>      unsigned long flags;
>>>>>      size_t foo;
>>>>>      int array[] __attribute__((counted_by (foo)));
>>>>> };
>>>>> 
>>>>> 
>>>>> struct annotated *p;
>>>>> int index = 16;
>>>>> 
>>>>> p = malloc(sizeof(*p) + index * sizeof(*p->array));  // allocated real 
>>>>> size 
>>>>> 
>>>>> p->foo = index + 2;  // p->foo was set by a different value than the real 
>>>>> size of p->array as in test 9 and 10
>>>> 
>>>> Right, tests 9 and 10 are checking that the _smallest_ possible value of
>>>> the array is used. (There are two sources of information: the allocation
>>>> size and the size calculated by counted_by. The smaller of the two
>>>> should be used when both are available.)
>>> 
>>> The counted_by attribute is used to annotate a Flexible array member on how 
>>> many elements it will have.
>>> However, if this information can not accurately reflect the real number of 
>>> elements for the array allocated, 
>>> What’s the purpose of such information? 
>> 
>> For example, imagine code that allocates space for 100 elements since
>> the common case is that the number of elements will grow over time.
>> Elements are added as it goes. For example:
>> 
>> struct grows {
>>      int alloc_count;
>>      int valid_count;
>>      struct element item[] __counted_by(valid_count);
>> } *p;
>> 
>> void something(void)
>> {
>>      p = malloc(sizeof(*p) + sizeof(*p->item) * 100);
>>      p->alloc_count = 100;
>>      p->valid_count = 0;
>> 
>>      /* this loop doesn't check that we don't go over 100. */
>>      while (items_to_copy) {
>>              struct element *item_ptr = get_next_item();
>>              /* __counted_by stays in sync: */
>>              p->valid_count++;
>>              p->item[p->valid_count - 1] = *item_ptr;
>>      }
>> }
>> 
>> We would want to catch cases there p->item[] is accessed with an index
>> that is >= p->valid_count, even though the allocation is (currently)
>> larger.
>> 
>> However, if we ever reached valid_count >= alloc_count, we need to trap
>> too, since we can still "see" the true allocation size.
>> 
>> Now, the __alloc_size hint is visible in very few places, so if there is
>> a strong reason to do so, I can live with saying that __counted_by takes
>> full precedence over __alloc_size. It seems it should be possible to
>> compare when both are present, but I can live with __counted_by being
>> the universal truth. :)
>> 
> 
> Thanks for the example and the explanation. Understood now.
> 
> LLVM’s RFC requires the following relationship: 
> (https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)
> 
> Buffer’s real allocated size >= counted_by value
> 
> Should be true all the time. 
> 
> I think that this is a reasonable requirement. 
> 
> (Otherwise, if counted_by > buffer’s real allocated size, overflow might 
> happen)
> 
> Is the above enough to cover your use cases?
> 
> If so, I will study a little bit more to see how to implement this.
>>> 
>>>>> or
>>>>> p->foo was not set to any value as in test 3 and 4
>>>> 
>>>> For tests 3 and 4, yes, this was my mistake. I have fixed up the tests
>>>> now. Bill noticed the same issue. Sorry for the think-o!
>>>> 
>>>>> ====
>>>>> 
>>>>> i.e, the value of p->foo is NOT synced with the number of elements 
>>>>> allocated for the array p->array.  
>>>>> 
>>>>> I think that this should be considered as an user error, and the 
>>>>> documentation of the attribute should include
>>>>> this requirement.  (In the LLVM’s RFC, such requirement was included in 
>>>>> the programing model: 
>>>>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18)
>>>>> 
>>>>> We can add a new warning option -Wcounted-by to report such user error if 
>>>>> needed.
>>>>> 
>>>>> What’s your opinion on this?
>>>> 
>>>> I think it is correct to allow mismatch between allocation and
>>>> counted_by as long as only the least of the two is used.
>>> 
>>> What’s your mean by “only the least of the two is used”?
>> 
>> I was just restating the above: if size information is available via
>> both __alloc_size and __counted_by, the smaller of the two should be
>> used.
> 
> If we are consistent with LLVM’s RFC, __alloc_size >= __counted_by all the 
> time.
> Then we can always use the counted_by info for the array size. 
>> 
>>> 
>>>> This may be
>>>> desirable in a few situations. One example would be a large allocation
>>>> that is slowly filled up by the program.
>>> 
>>> So, for such situation, whenever the allocation is filled up, the field 
>>> that hold the “counted_by” attribute should be increased at the same time,
>>> Then, the “counted_by” value always sync with the real allocation. 
>>>> I.e. the counted_by member is
>>>> slowly increased during runtime (but not beyond the true allocation size).
>>> 
>>> Then there should be source code to increase the “counted_by” field 
>>> whenever the allocated space increased too. 
>>>> 
>>>> Of course allocation size is only available in limited situations, so
>>>> the loss of that info is fine: we have counted_by for everything else.
>>> 
>>> The point is: allocation size should synced with the value of “counted_by”. 
>>> LLVM’s RFC also have the similar requirement:
>>> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18
>> 
>> Right, I'm saying it would be nice if __alloc_size was checked as well,
>> in the sense that if it is available, it knows without question what the
>> size of the allocation is. If __alloc_size and __counted_by conflict,
>> the smaller of the two should be the truth.
> 
>> 
>> But, as I said, if there is some need to explicitly ignore __alloc_size
>> when __counted_by is present, I can live with it; we just need to
>> document it.
>> 
>> If the RFC and you agree that the __counted_by variable can only ever be
>> (re)assigned after the flex array has been (re)allocated, then I guess
>> we'll see how it goes. :) I think most places in the kernel using
>> __counted_by will be fine, but I suspect we may have cases where we need
>> to update it like in the loop I described above. If that's true, we can
>> revisit the requirement then. :)
> 
> I guess if we can always keep  the relationship:   __alloc_size >= 
> __counted_by all the time. Should be fine.
> 
> Please check whether this is enough for kernel, I will check whether this is 
> doable For GCC.
> 
> thanks.
> 
> 
> Qing
>> 
>> -Kees
>> 
>> -- 
>> Kees Cook

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

Reply via email to