> On Dec 2, 2024, at 16:13, Martin Uecker <uec...@tugraz.at> wrote: > > Am Montag, dem 02.12.2024 um 20:15 +0000 schrieb Qing Zhao: >> >>> On Nov 30, 2024, at 07:22, Martin Uecker <uec...@tugraz.at> wrote: >>> >>> Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao: >>>> Think this over these days, I have another thought that need some feedback: >>>> >>>> The major issue right now is: >>>> >>>> 1. For the following structure in which the “counted_by” attributes is >>>> attached to the pointer field. >>>> >>>> struct foo { >>>> int n; >>>> char *p __attribute__ ((counted_by (n))); >>>> } *x; >>> >>> BTW: I also do not like the syntax 'n' for a lookup of a member >>> in the member namespace. I think it should be '.n'. For FAM this >>> is less problematic because it always at the end, but here it is >>> problematic. >> During my implementation of extending this attribute to pointer fields of >> structure. >> I didn’t notice issue with the current syntax ’n’ for the pointer fields so >> far, >> even though when the field “n” is declared after the corresponding pointer >> field, i.e: >> >> struct foo { >> { >> char *p __attribute__ ((counted_by (n))); >> int n; >> } >> >> So, could you please explain a little bit more on what’s the potential issue >> here? > > My issue with it is that it is not consistent to how C's scoping > of identifiers work. In > > constexpr int n = 3; > struct foo { > { > char (*p)[n] __attribute__ ((counted_by (n)) > int n; > } > > the n in the type of p refers to the previous n in scope while > the n in your attribute would refer to the member.
Okay, I see your point here -:). Yes, I agree that if the compiler can accept the syntax “.n” in general, then it will make the “counted_by” attribute to allow more complicated expressions in general. We had this similar discussion before the design and implementation for the “counted_by” attribute on FAMs, and we agreed to delay the approach of accepting the syntax “.n” in the future possible language standard at that time. So, for the “counted_by attribute on FAMs, the implementation is, searching the “n” in all the fields of the containing structure and locating that specific field. Now, when extending “counted_by” attribute to pointer fields of structures, the implementation is similar. > > This is incoherent and confusing. It becomes worse should > you ever want to allow more complicated expressions. You are right, it’s hard to allow more complicated expressions for “counted_by” based on the current design. If we agree to accept the “.n” syntax in GCC in general, that’s of course better. Then how about the current “counted_by” for FAMs? Shall we also change it to accept “.n” syntax? > > It would be clearer if you the syntax ".n" which resembles > the syntax for designated initializers that is already used > in initializers to refer to struct members. > > constexpr int n; > struct foo { > { > char (*p)[n] __attribute__ ((counted_by (.n)) > int n; > } > Yes, I agree. > >> >> >>>> >>>> There is one important additional requirement: >>>> >>>> x->n, x->p can ONLY be changed by changing the whole structure at the same >>>> time. >>>> Otherwise, x->n might not be consistent with x->p. >>> >>> By itself, this would still not fix the issue I pointed out. >>> >>> struct foo x; >>> x = .. ; // set the whole structure >>> char *p = x->p; >>> x = ... ; // set the whole structure >>> >>> What is the bound for 'p' ? >> >> Since p was set to the pointer field of the old structure, then the bound of >> it should be the old bound. >>> With current rules it would be the old bound. >> >> I thought that this should be the correct behavior, isn’t it? > > Yes, sorry, what I meant was "with the current rules it would be > the *new* bound”. struct foo x; x=… ; // set the whole structure 1 char *p = x->p; x=… ; // set the whole structure 2 In the above, when “set the whole structure 1”, x1, x1->n and x1->p are set at the same time; After *p = x->p; the pointer “p” is pointing to “x1->p”, it’s bound is “x1->n”; Then when “set the whole structure 2”, x2 is different than x1, x2->n and x2->p are set at the same time, the pointer ‘p’ still points to “x1->p”, therefore it’s bound should be “x1->n”. So, as long as the whole structure is set at the same time, should be fine. Do I miss anything here? Qing > (And I guess this is why you suggest to potentially > change it below). > > Martin > >> >>> >>> >>>> >>>> 2. This new requirement is ONLY for “counted_by” attribute that is >>>> attached to the pointer field, not needed >>>> for flexible array members. >>>> >>>> 3. Then there will be inconsistency for the “counted_by” attribute between >>>> FAM and pointer field. >>> >>> >>>> The major questions I have right now: >>>> >>>> 1. Shall we keep this inconsistency between FAM and pointer field? >>>> Or, >>>> >>>> 2. Shall we keep them consistent by adding this new requirement for the >>>> previous FAM? >>> >>> Or have a new attribute? I feel we double down on a bad design >>> which made sense for FAM addressing a very specific use case >>> (retrofitting checks to the Linux kernel) but is otherwise not >>> very strong. >> >> I do agree that the previous specific feature of the “counted_by" for the >> FAM, i.e: >> " >> One important feature of the attribute is, a reference to the >> flexible array member field uses the latest value assigned to the >> field that represents the number of the elements before that >> reference. For example, >> >> p->count = val1; >> p->array[20] = 0; // ref1 to p->array >> p->count = val2; >> p->array[30] = 0; // ref2 to p->array >> >> in the above, 'ref1' uses 'val1' as the number of the elements in >> 'p->array', and 'ref2' uses 'val2' as the number of elements in >> 'p->array’. >> “ >> >> Is a specific feature for Linux kernel, I am wondering whether the above >> feature >> really needed by the Linux kernel? >> >> If Not, I prefer to eliminate this specific feature from GCC before we >> officially release the “counted_by” >> attribute in GCC15. >> >> If Linux kernel does need this feature, Yes, maybe a new attribute for >> pointer is better. >> >> Kees, could you please also provide more comments and suggestion on this? >> > > > >> >>> >>> Martin >>> >>>> >>>> Kees, the following feature that you requested for the FAM: >>>> >>>> " >>>> One important feature of the attribute is, a reference to the >>>> flexible array member field uses the latest value assigned to the >>>> field that represents the number of the elements before that >>>> reference. For example, >>>> >>>> p->count = val1; >>>> p->array[20] = 0; // ref1 to p->array >>>> p->count = val2; >>>> p->array[30] = 0; // ref2 to p->array >>>> >>>> in the above, 'ref1' uses 'val1' as the number of the elements in >>>> 'p->array', and 'ref2' uses 'val2' as the number of elements in >>>> 'p->array’. >>>> " >>>> Has this feature been used by Linux kernel already? >>>> Is this feature really needed by Linux kernel? >>>> >>>> Thanks a lot for suggestions and comments. >>>> >>>> Qing >>>> >>>>> On Nov 20, 2024, at 14:23, Martin Uecker <uec...@tugraz.at> wrote: >>>>> >>>>> Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao: >>>>>> Hi, Martin, >>>>>> >>>>>> Thanks a lot for pointing this out. >>>>>> >>>>>> This does look like a problem we need avoid for the pointer arrays. >>>>>> >>>>>> Does the same problem exist in the language extension too if the n is >>>>>> allowed to be changed after initialization? >>>>>> >>>>>> If so, for the future language extension, is there any proposed solution >>>>>> to this problem? >>>>>> >>>>> >>>>> There is no specification yet and nothing formally proposed, so >>>>> it is entirely unclear at this point. >>>>> >>>>> My idea would be to give 'x->buf' the type '(char(*)[x->n])' >>>>> where 'x->n' is loaded at the same time 'x->buf' is accessed >>>>> and then the value is frozen (like in the simpler versions >>>>> of 'counted_by' you had implemented first). Of course, one >>>>> would then have to set x->n before setting the buffer (or >>>>> at the same time). This could be ensured by making the >>>>> member 'n' const, so that it can only be changed by >>>>> overwriting the whole struct. But I am still thinking >>>>> about this. >>>>> >>>>> In any case, I think for "counted_by" this is not an option >>>>> because it would be confusing if it works differently >>>>> than for the FAM case. >>>>> >>>>> Martin >>>>> >>>>> >>>>>> Qing >>>>>>> On Nov 20, 2024, at 10:52, Martin Uecker <uec...@tugraz.at> wrote: >>>>>>> >>>>>>> Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao: >>>>>>>> >>>>>>>>> On Nov 19, 2024, at 10:47, Marek Polacek <pola...@redhat.com> wrote: >>>>>>>>> >>>>>>>>> On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote: >>>>>>>>>> Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao: >>>>>>>>>>> Hi, >>>>>>>> >>>>>>> .. >>>>>>> >>>>>>> Hi Qing, >>>>>>> >>>>>>>> Per our discussion so far, if treating the following >>>>>>>> >>>>>>>> struct foo { >>>>>>>> int n; >>>>>>>> char *p __attribute__ ((counted_by (n))); >>>>>>>> }; >>>>>>>> >>>>>>>> as an array with upper-bounds being “n” is reasonable. >>>>>>> >>>>>>> There is one issue I want to point out, which I just realized during >>>>>>> a discussion in WG14. For "counted_by" we defined the semantics >>>>>>> in a way that later store to 'n' will be taken into account. >>>>>>> We did this to support the use case where 'n' is set after >>>>>>> the access to 'p'. >>>>>>> >>>>>>> struct foo *x = ; >>>>>>> >>>>>>> char *q = x->p; >>>>>>> x->n = 100; // this should apply >>>>>>> >>>>>>> >>>>>>> For FAMs this is fine, because it is a fixed part >>>>>>> of the struct. But for the new pointer case, I think this is >>>>>>> problematic. Consider this example: >>>>>>> >>>>>>> struct foo *x = allocate_buffer(100); >>>>>>> >>>>>>> where x->n is set to the right value in the allocation function. >>>>>>> >>>>>>> Now let's continue with >>>>>>> >>>>>>> char *q = x->p; >>>>>>> >>>>>>> x = allocate_buffer(50); >>>>>>> // x->n is set to 50. >>>>>>> >>>>>>> Now q refers to the old buffer, but x->n to the size of the new >>>>>>> buffer. That does not seem right and scares me a little bit. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Then, it’s reasonable to extend -fsanitize=bounds to instrument the >>>>>>>> corresponding reference for the pointer with >>>>>>>> Counted-by attribute. >>>>>>>> >>>>>>>> What do you think? >>>>>>>> >>>>>>>> Qing >>>>>>>> >>>>>>>>> >>>>>>>>>> I think the question is what -fsanitize=bounds is meant to be. >>>>>>>>>> >>>>>>>>>> I am a bit frustrated about the sanitizer. On the >>>>>>>>>> one hand, it is not doing enough to get spatial memory >>>>>>>>>> safety even where this would be easily possible, on the >>>>>>>>>> other hand, is pedantic about things which are technically >>>>>>>>>> UB but not problematic and then one is prevented from >>>>>>>>>> using it >>>>>>>>>> >>>>>>>>>> When used in default mode, where execution continues, it >>>>>>>>>> also does not mix well with many warning, creates more code, >>>>>>>>>> and pulls in a libary dependency (and the library also depends >>>>>>>>>> on upstream choices / progress which seems a limitation for >>>>>>>>>> extensions). >>>>>>>>>> >>>>>>>>>> What IMHO would be ideal is a protection mode for spatial >>>>>>>>>> memory safety that simply adds traps (which then requires >>>>>>>>>> no library, has no issues with other warnings, and could >>>>>>>>>> evolve independently from clang) >>>>>>>>>> >>>>>>>>>> So shouldn't we just add a -fboundscheck (which would >>>>>>>>>> be like -fsanitize=bounds -fsanitize-trap=bounds just with >>>>>>>>>> more checking) and make it really good? I think many people >>>>>>>>>> would be very happy about this. >>>>>>>>> >>>>>>>>> That's a separate concern. We already have the -fbounds-check option, >>>>>>>>> currently only used in Fortran (and D?), so perhaps we could make >>>>>>>>> that option a shorthand for -fsanitize=bounds -fsanitize-trap=bounds. >>>>>>>>> >>>>>>>>> Marek