> On Dec 2, 2024, at 16:13, Martin Uecker <[email protected]> wrote:
>
> Am Montag, dem 02.12.2024 um 20:15 +0000 schrieb Qing Zhao:
>>
>>> On Nov 30, 2024, at 07:22, Martin Uecker <[email protected]> wrote:
>>>
>>> Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao:
>>>> Think this over these days, I have another thought that need some feedback:
>>>>
>>>> The major issue right now is:
>>>>
>>>> 1. For the following structure in which the “counted_by” attributes is
>>>> attached to the pointer field.
>>>>
>>>> struct foo {
>>>> int n;
>>>> char *p __attribute__ ((counted_by (n)));
>>>> } *x;
>>>
>>> BTW: I also do not like the syntax 'n' for a lookup of a member
>>> in the member namespace. I think it should be '.n'. For FAM this
>>> is less problematic because it always at the end, but here it is
>>> problematic.
>> During my implementation of extending this attribute to pointer fields of
>> structure.
>> I didn’t notice issue with the current syntax ’n’ for the pointer fields so
>> far,
>> even though when the field “n” is declared after the corresponding pointer
>> field, i.e:
>>
>> struct foo {
>> {
>> char *p __attribute__ ((counted_by (n)));
>> int n;
>> }
>>
>> So, could you please explain a little bit more on what’s the potential issue
>> here?
>
> My issue with it is that it is not consistent to how C's scoping
> of identifiers work. In
>
> constexpr int n = 3;
> struct foo {
> {
> char (*p)[n] __attribute__ ((counted_by (n))
> int n;
> }
>
> the n in the type of p refers to the previous n in scope while
> the n in your attribute would refer to the member.
Okay, I see your point here -:).
Yes, I agree that if the compiler can accept the syntax “.n” in general, then
it will make the
“counted_by” attribute to allow more complicated expressions in general.
We had this similar discussion before the design and implementation for the
“counted_by” attribute
on FAMs, and we agreed to delay the approach of accepting the syntax “.n” in
the future possible
language standard at that time.
So, for the “counted_by attribute on FAMs, the implementation is, searching the
“n” in all the fields of the
containing structure and locating that specific field.
Now, when extending “counted_by” attribute to pointer fields of structures, the
implementation is similar.
>
> This is incoherent and confusing. It becomes worse should
> you ever want to allow more complicated expressions.
You are right, it’s hard to allow more complicated expressions for “counted_by”
based on the current
design.
If we agree to accept the “.n” syntax in GCC in general, that’s of course
better.
Then how about the current “counted_by” for FAMs? Shall we also change it to
accept “.n” syntax?
>
> It would be clearer if you the syntax ".n" which resembles
> the syntax for designated initializers that is already used
> in initializers to refer to struct members.
>
> constexpr int n;
> struct foo {
> {
> char (*p)[n] __attribute__ ((counted_by (.n))
> int n;
> }
>
Yes, I agree.
>
>>
>>
>>>>
>>>> There is one important additional requirement:
>>>>
>>>> x->n, x->p can ONLY be changed by changing the whole structure at the same
>>>> time.
>>>> Otherwise, x->n might not be consistent with x->p.
>>>
>>> By itself, this would still not fix the issue I pointed out.
>>>
>>> struct foo x;
>>> x = .. ; // set the whole structure
>>> char *p = x->p;
>>> x = ... ; // set the whole structure
>>>
>>> What is the bound for 'p' ?
>>
>> Since p was set to the pointer field of the old structure, then the bound of
>> it should be the old bound.
>>> With current rules it would be the old bound.
>>
>> I thought that this should be the correct behavior, isn’t it?
>
> Yes, sorry, what I meant was "with the current rules it would be
> the *new* bound”.
struct foo x;
x=… ; // set the whole structure 1
char *p = x->p;
x=… ; // set the whole structure 2
In the above, when “set the whole structure 1”, x1, x1->n and x1->p are set at
the same time;
After *p = x->p; the pointer “p” is pointing to “x1->p”, it’s bound is
“x1->n”;
Then when “set the whole structure 2”, x2 is different than x1, x2->n and
x2->p are set at the same time, the pointer
‘p’ still points to “x1->p”, therefore it’s bound should be “x1->n”.
So, as long as the whole structure is set at the same time, should be fine.
Do I miss anything here?
Qing
> (And I guess this is why you suggest to potentially
> change it below).
>
> Martin
>
>>
>>>
>>>
>>>>
>>>> 2. This new requirement is ONLY for “counted_by” attribute that is
>>>> attached to the pointer field, not needed
>>>> for flexible array members.
>>>>
>>>> 3. Then there will be inconsistency for the “counted_by” attribute between
>>>> FAM and pointer field.
>>>
>>>
>>>> The major questions I have right now:
>>>>
>>>> 1. Shall we keep this inconsistency between FAM and pointer field?
>>>> Or,
>>>>
>>>> 2. Shall we keep them consistent by adding this new requirement for the
>>>> previous FAM?
>>>
>>> Or have a new attribute? I feel we double down on a bad design
>>> which made sense for FAM addressing a very specific use case
>>> (retrofitting checks to the Linux kernel) but is otherwise not
>>> very strong.
>>
>> I do agree that the previous specific feature of the “counted_by" for the
>> FAM, i.e:
>> "
>> One important feature of the attribute is, a reference to the
>> flexible array member field uses the latest value assigned to the
>> field that represents the number of the elements before that
>> reference. For example,
>>
>> p->count = val1;
>> p->array[20] = 0; // ref1 to p->array
>> p->count = val2;
>> p->array[30] = 0; // ref2 to p->array
>>
>> in the above, 'ref1' uses 'val1' as the number of the elements in
>> 'p->array', and 'ref2' uses 'val2' as the number of elements in
>> 'p->array’.
>> “
>>
>> Is a specific feature for Linux kernel, I am wondering whether the above
>> feature
>> really needed by the Linux kernel?
>>
>> If Not, I prefer to eliminate this specific feature from GCC before we
>> officially release the “counted_by”
>> attribute in GCC15.
>>
>> If Linux kernel does need this feature, Yes, maybe a new attribute for
>> pointer is better.
>>
>> Kees, could you please also provide more comments and suggestion on this?
>>
>
>
>
>>
>>>
>>> Martin
>>>
>>>>
>>>> Kees, the following feature that you requested for the FAM:
>>>>
>>>> "
>>>> One important feature of the attribute is, a reference to the
>>>> flexible array member field uses the latest value assigned to the
>>>> field that represents the number of the elements before that
>>>> reference. For example,
>>>>
>>>> p->count = val1;
>>>> p->array[20] = 0; // ref1 to p->array
>>>> p->count = val2;
>>>> p->array[30] = 0; // ref2 to p->array
>>>>
>>>> in the above, 'ref1' uses 'val1' as the number of the elements in
>>>> 'p->array', and 'ref2' uses 'val2' as the number of elements in
>>>> 'p->array’.
>>>> "
>>>> Has this feature been used by Linux kernel already?
>>>> Is this feature really needed by Linux kernel?
>>>>
>>>> Thanks a lot for suggestions and comments.
>>>>
>>>> Qing
>>>>
>>>>> On Nov 20, 2024, at 14:23, Martin Uecker <[email protected]> wrote:
>>>>>
>>>>> Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao:
>>>>>> Hi, Martin,
>>>>>>
>>>>>> Thanks a lot for pointing this out.
>>>>>>
>>>>>> This does look like a problem we need avoid for the pointer arrays.
>>>>>>
>>>>>> Does the same problem exist in the language extension too if the n is
>>>>>> allowed to be changed after initialization?
>>>>>>
>>>>>> If so, for the future language extension, is there any proposed solution
>>>>>> to this problem?
>>>>>>
>>>>>
>>>>> There is no specification yet and nothing formally proposed, so
>>>>> it is entirely unclear at this point.
>>>>>
>>>>> My idea would be to give 'x->buf' the type '(char(*)[x->n])'
>>>>> where 'x->n' is loaded at the same time 'x->buf' is accessed
>>>>> and then the value is frozen (like in the simpler versions
>>>>> of 'counted_by' you had implemented first). Of course, one
>>>>> would then have to set x->n before setting the buffer (or
>>>>> at the same time). This could be ensured by making the
>>>>> member 'n' const, so that it can only be changed by
>>>>> overwriting the whole struct. But I am still thinking
>>>>> about this.
>>>>>
>>>>> In any case, I think for "counted_by" this is not an option
>>>>> because it would be confusing if it works differently
>>>>> than for the FAM case.
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>> Qing
>>>>>>> On Nov 20, 2024, at 10:52, Martin Uecker <[email protected]> wrote:
>>>>>>>
>>>>>>> Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao:
>>>>>>>>
>>>>>>>>> On Nov 19, 2024, at 10:47, Marek Polacek <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
>>>>>>>>>> Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao:
>>>>>>>>>>> Hi,
>>>>>>>>
>>>>>>> ..
>>>>>>>
>>>>>>> Hi Qing,
>>>>>>>
>>>>>>>> Per our discussion so far, if treating the following
>>>>>>>>
>>>>>>>> struct foo {
>>>>>>>> int n;
>>>>>>>> char *p __attribute__ ((counted_by (n)));
>>>>>>>> };
>>>>>>>>
>>>>>>>> as an array with upper-bounds being “n” is reasonable.
>>>>>>>
>>>>>>> There is one issue I want to point out, which I just realized during
>>>>>>> a discussion in WG14. For "counted_by" we defined the semantics
>>>>>>> in a way that later store to 'n' will be taken into account.
>>>>>>> We did this to support the use case where 'n' is set after
>>>>>>> the access to 'p'.
>>>>>>>
>>>>>>> struct foo *x = ;
>>>>>>>
>>>>>>> char *q = x->p;
>>>>>>> x->n = 100; // this should apply
>>>>>>>
>>>>>>>
>>>>>>> For FAMs this is fine, because it is a fixed part
>>>>>>> of the struct. But for the new pointer case, I think this is
>>>>>>> problematic. Consider this example:
>>>>>>>
>>>>>>> struct foo *x = allocate_buffer(100);
>>>>>>>
>>>>>>> where x->n is set to the right value in the allocation function.
>>>>>>>
>>>>>>> Now let's continue with
>>>>>>>
>>>>>>> char *q = x->p;
>>>>>>>
>>>>>>> x = allocate_buffer(50);
>>>>>>> // x->n is set to 50.
>>>>>>>
>>>>>>> Now q refers to the old buffer, but x->n to the size of the new
>>>>>>> buffer. That does not seem right and scares me a little bit.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Then, it’s reasonable to extend -fsanitize=bounds to instrument the
>>>>>>>> corresponding reference for the pointer with
>>>>>>>> Counted-by attribute.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> Qing
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I think the question is what -fsanitize=bounds is meant to be.
>>>>>>>>>>
>>>>>>>>>> I am a bit frustrated about the sanitizer. On the
>>>>>>>>>> one hand, it is not doing enough to get spatial memory
>>>>>>>>>> safety even where this would be easily possible, on the
>>>>>>>>>> other hand, is pedantic about things which are technically
>>>>>>>>>> UB but not problematic and then one is prevented from
>>>>>>>>>> using it
>>>>>>>>>>
>>>>>>>>>> When used in default mode, where execution continues, it
>>>>>>>>>> also does not mix well with many warning, creates more code,
>>>>>>>>>> and pulls in a libary dependency (and the library also depends
>>>>>>>>>> on upstream choices / progress which seems a limitation for
>>>>>>>>>> extensions).
>>>>>>>>>>
>>>>>>>>>> What IMHO would be ideal is a protection mode for spatial
>>>>>>>>>> memory safety that simply adds traps (which then requires
>>>>>>>>>> no library, has no issues with other warnings, and could
>>>>>>>>>> evolve independently from clang)
>>>>>>>>>>
>>>>>>>>>> So shouldn't we just add a -fboundscheck (which would
>>>>>>>>>> be like -fsanitize=bounds -fsanitize-trap=bounds just with
>>>>>>>>>> more checking) and make it really good? I think many people
>>>>>>>>>> would be very happy about this.
>>>>>>>>>
>>>>>>>>> That's a separate concern. We already have the -fbounds-check option,
>>>>>>>>> currently only used in Fortran (and D?), so perhaps we could make
>>>>>>>>> that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds.
>>>>>>>>>
>>>>>>>>> Marek