Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

Qing Zhao Mon, 02 Dec 2024 12:21:27 -0800


> On Nov 30, 2024, at 07:22, Martin Uecker <uec...@tugraz.at> wrote:
> 
> Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao:
>> Think this over these days, I have another thought that need some feedback:
>> 
>> The major issue right now is:
>> 
>> 1. For the following structure in which the “counted_by” attributes is 
>> attached to the pointer field.
>> 
>> struct foo {
>>  int n;
>>  char *p __attribute__ ((counted_by (n)));
>> } *x;
> 
> BTW: I also do not like the syntax 'n' for a lookup of a member 
> in the member namespace. I think it should be '.n'.  For FAM this 
> is less problematic because it always at the end, but here it is 
> problematic.
During my implementation of extending this attribute to pointer fields of 
structure. 
I didn’t notice  issue with the current syntax ’n’ for the pointer fields so 
far, 
even though when  the field “n” is declared after the corresponding pointer 
field, i.e:


struct foo {
{
  char *p __attribute__ ((counted_by (n)));
  int n;
}

So, could you please explain a little bit more on what’s the potential issue 
here?


>> 
>> There is one important additional requirement:
>> 
>> x->n, x->p can ONLY be changed by changing the whole structure at the same 
>> time. 
>> Otherwise, x->n might not be consistent with x->p.
> 
> By itself, this would still not fix the issue I pointed out.
> 
> struct foo x;
> x = .. ; // set the whole structure
> char *p = x->p;
> x = ... ; // set the whole structure
> 
> What is the bound for 'p' ?  

Since p was set to the pointer field of the old structure, then the bound of it 
should be the old bound. 
> With current rules it would be the old bound.

I thought that this should be the correct behavior, isn’t it?

> 
> 
>> 
>> 2. This new requirement is ONLY for “counted_by” attribute that is attached 
>> to the pointer field, not needed
>> for flexible array members.
>> 
>> 3. Then there will be inconsistency for the “counted_by” attribute between 
>> FAM and pointer field.
> 
> 
>> The major questions I have right now:
>> 
>> 1. Shall we keep this inconsistency between FAM and pointer field? 
>> Or, 
>> 
>> 2. Shall we keep them consistent by adding this new requirement for the 
>> previous FAM? 
> 
> Or have a new attribute?  I feel we double down on a bad design
> which made sense for FAM addressing a very specific use case
> (retrofitting checks to the Linux kernel) but is otherwise not
> very strong.

I do agree that the previous specific feature of the “counted_by" for the FAM, 
i.e:
"
    One important feature of the attribute is, a reference to the
    flexible array member field uses the latest value assigned to the
    field that represents the number of the elements before that
    reference.  For example,

           p->count = val1;
           p->array[20] = 0;  // ref1 to p->array
           p->count = val2;
           p->array[30] = 0;  // ref2 to p->array

    in the above, 'ref1' uses 'val1' as the number of the elements in
    'p->array', and 'ref2' uses 'val2' as the number of elements in
    'p->array’.
“

Is a specific feature for Linux kernel, I am wondering whether the above feature
really needed by the Linux kernel? 

If Not, I prefer to eliminate this specific feature from GCC before we 
officially release the “counted_by”
attribute in GCC15.

If Linux kernel does need this feature, Yes, maybe a new attribute for pointer 
is better.

Kees, could you please also provide more comments and suggestion on this?


> 
> Martin
> 
>> 
>> Kees, the following feature that you requested for the FAM:
>> 
>> "
>>     One important feature of the attribute is, a reference to the
>>     flexible array member field uses the latest value assigned to the
>>     field that represents the number of the elements before that
>>     reference.  For example,
>> 
>>            p->count = val1;
>>            p->array[20] = 0;  // ref1 to p->array
>>            p->count = val2;
>>            p->array[30] = 0;  // ref2 to p->array
>> 
>>     in the above, 'ref1' uses 'val1' as the number of the elements in
>>     'p->array', and 'ref2' uses 'val2' as the number of elements in
>>     'p->array’.
>> "
>> Has this feature been used by Linux kernel already?
>> Is this feature really needed by Linux kernel?
>> 
>> Thanks a lot for suggestions and comments.
>> 
>> Qing
>> 
>>> On Nov 20, 2024, at 14:23, Martin Uecker <uec...@tugraz.at> wrote:
>>> 
>>> Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao:
>>>> Hi, Martin,
>>>> 
>>>> Thanks a lot for pointing this out. 
>>>> 
>>>> This does look like a problem we need avoid for the pointer arrays.
>>>> 
>>>> Does  the same problem exist in the language extension too if the n is 
>>>> allowed to be changed after initialization?
>>>> 
>>>> If so, for the future language extension, is there any proposed solution 
>>>> to this problem? 
>>>> 
>>> 
>>> There is no specification yet and nothing formally proposed, so
>>> it is entirely unclear at this point.
>>> 
>>> My idea would be to give 'x->buf' the type '(char(*)[x->n])'
>>> where 'x->n' is loaded at the same time 'x->buf' is accessed
>>> and then the value is frozen (like in the simpler versions
>>> of 'counted_by' you had implemented first).  Of course, one
>>> would then have to set x->n before setting the buffer (or
>>> at the same time). This could be ensured by making the
>>> member 'n' const, so that it can only be changed by
>>> overwriting the whole struct. But I am still thinking
>>> about this.
>>> 
>>> In any case, I think for "counted_by" this is not an option
>>> because it would be confusing if it works differently
>>> than for the FAM case. 
>>> 
>>> Martin
>>> 
>>> 
>>>> Qing
>>>>> On Nov 20, 2024, at 10:52, Martin Uecker <uec...@tugraz.at> wrote:
>>>>> 
>>>>> Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao:
>>>>>> 
>>>>>>> On Nov 19, 2024, at 10:47, Marek Polacek <pola...@redhat.com> wrote:
>>>>>>> 
>>>>>>> On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
>>>>>>>> Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao:
>>>>>>>>> Hi,
>>>>>> 
>>>>> ..
>>>>> 
>>>>> Hi Qing,
>>>>> 
>>>>>> Per our discussion so far, if treating the following
>>>>>> 
>>>>>> struct foo {
>>>>>> int n;
>>>>>> char *p __attribute__ ((counted_by (n)));
>>>>>> };
>>>>>> 
>>>>>> as an array with upper-bounds being “n” is reasonable.
>>>>> 
>>>>> There is one issue I want to point out, which I just realized during
>>>>> a discussion in WG14.  For "counted_by" we defined the semantics
>>>>> in a way that later store to 'n' will be taken into account. 
>>>>> We did this to support the use case where 'n' is set after
>>>>> the access to 'p'.
>>>>> 
>>>>> struct foo *x = ;
>>>>> 
>>>>> char *q = x->p;
>>>>> x->n = 100; // this should apply
>>>>> 
>>>>> 
>>>>> For FAMs this is fine, because it is a fixed part
>>>>> of the struct.  But for the new pointer case, I think this is
>>>>> problematic.  Consider this example:
>>>>> 
>>>>> struct foo *x = allocate_buffer(100);
>>>>> 
>>>>> where x->n is set to the right value in the allocation function.
>>>>> 
>>>>> Now let's continue with
>>>>> 
>>>>> char *q = x->p;
>>>>> 
>>>>> x = allocate_buffer(50);
>>>>> // x->n is set to 50.
>>>>> 
>>>>> Now q refers to the old buffer, but x->n to the size of the new
>>>>> buffer.  That does not seem right and scares me a little bit.
>>>>> 
>>>>> 
>>>>> 
>>>>> Martin
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Then, it’s reasonable to extend -fsanitize=bounds to instrument the 
>>>>>> corresponding reference for the pointer with
>>>>>> Counted-by attribute. 
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> Qing
>>>>>> 
>>>>>>> 
>>>>>>>> I think the question is what -fsanitize=bounds is meant to be.
>>>>>>>> 
>>>>>>>> I am a bit frustrated about the sanitizer.  On the
>>>>>>>> one hand, it is not doing enough to get spatial memory
>>>>>>>> safety even where this would be easily possible, on the
>>>>>>>> other hand, is pedantic about things which are technically
>>>>>>>> UB but not problematic and then one is prevented from
>>>>>>>> using it
>>>>>>>> 
>>>>>>>> When used in default mode, where execution continues, it
>>>>>>>> also does not mix well with many warning, creates more code,
>>>>>>>> and pulls in a libary dependency (and the library also depends
>>>>>>>> on upstream choices / progress which seems a limitation for
>>>>>>>> extensions).
>>>>>>>> 
>>>>>>>> What IMHO would be ideal is a protection mode for spatial
>>>>>>>> memory safety that simply adds traps (which then requires
>>>>>>>> no library, has no issues with other warnings, and could
>>>>>>>> evolve independently from clang) 
>>>>>>>> 
>>>>>>>> So shouldn't we just add a -fboundscheck (which would 
>>>>>>>> be like -fsanitize=bounds -fsanitize-trap=bounds just with
>>>>>>>> more checking) and make it really good? I think many people
>>>>>>>> would be very happy about this.
>>>>>>> 
>>>>>>> That's a separate concern.  We already have the -fbounds-check option,
>>>>>>> currently only used in Fortran (and D?), so perhaps we could make
>>>>>>> that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
>>>>>>> 
>>>>>>> Marek

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

Reply via email to