Am Montag, dem 02.12.2024 um 22:58 +0000 schrieb Qing Zhao: > > > On Dec 2, 2024, at 16:13, Martin Uecker <uec...@tugraz.at> wrote: > > > > Am Montag, dem 02.12.2024 um 20:15 +0000 schrieb Qing Zhao: > > > > > > > On Nov 30, 2024, at 07:22, Martin Uecker <uec...@tugraz.at> wrote: > > > > > > > > Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao: > > > > > Think this over these days, I have another thought that need some > > > > > feedback: > > > > > > > > > > The major issue right now is: > > > > > > > > > > 1. For the following structure in which the “counted_by” attributes > > > > > is attached to the pointer field. > > > > > > > > > > struct foo { > > > > > int n; > > > > > char *p __attribute__ ((counted_by (n))); > > > > > } *x; > > > > > > > > BTW: I also do not like the syntax 'n' for a lookup of a member > > > > in the member namespace. I think it should be '.n'. For FAM this > > > > is less problematic because it always at the end, but here it is > > > > problematic. > > > During my implementation of extending this attribute to pointer fields of > > > structure. > > > I didn’t notice issue with the current syntax ’n’ for the pointer fields > > > so far, > > > even though when the field “n” is declared after the corresponding > > > pointer field, i.e: > > > > > > struct foo { > > > { > > > char *p __attribute__ ((counted_by (n))); > > > int n; > > > } > > > > > > So, could you please explain a little bit more on what’s the potential > > > issue here? > > > > My issue with it is that it is not consistent to how C's scoping > > of identifiers work. In > > > > constexpr int n = 3; > > struct foo { > > { > > char (*p)[n] __attribute__ ((counted_by (n)) > > int n; > > } > > > > the n in the type of p refers to the previous n in scope while > > the n in your attribute would refer to the member. > > Okay, I see your point here -:). > > Yes, I agree that if the compiler can accept the syntax “.n” in general, then > it will make the > “counted_by” attribute to allow more complicated expressions in general. > > We had this similar discussion before the design and implementation for the > “counted_by” attribute > on FAMs, and we agreed to delay the approach of accepting the syntax “.n” in > the future possible > language standard at that time. > > So, for the “counted_by attribute on FAMs, the implementation is, searching > the “n” in all the fields of the > containing structure and locating that specific field. > > Now, when extending “counted_by” attribute to pointer fields of structures, > the implementation is similar.
> > > > This is incoherent and confusing. It becomes worse should > > you ever want to allow more complicated expressions. > > You are right, it’s hard to allow more complicated expressions for > “counted_by” based on the current > design. > > If we agree to accept the “.n” syntax in GCC in general, that’s of course > better. > Then how about the current “counted_by” for FAMs? Shall we also change it to > accept “.n” syntax? My recommendation would be to change it. It is also not ideal for this case - only less problematic. > > > > It would be clearer if you the syntax ".n" which resembles > > the syntax for designated initializers that is already used > > in initializers to refer to struct members. > > > > constexpr int n; > > struct foo { > > { > > char (*p)[n] __attribute__ ((counted_by (.n)) > > int n; > > } > > > Yes, I agree. > > > > > > > > > > > > > > > > > > There is one important additional requirement: > > > > > > > > > > x->n, x->p can ONLY be changed by changing the whole structure at the > > > > > same time. > > > > > Otherwise, x->n might not be consistent with x->p. > > > > > > > > By itself, this would still not fix the issue I pointed out. > > > > > > > > struct foo x; > > > > x = .. ; // set the whole structure > > > > char *p = x->p; > > > > x = ... ; // set the whole structure > > > > > > > > What is the bound for 'p' ? > > > > > > Since p was set to the pointer field of the old structure, then the bound > > > of it should be the old bound. > > > > With current rules it would be the old bound. > > > > > > I thought that this should be the correct behavior, isn’t it? > > > > Yes, sorry, what I meant was "with the current rules it would be > > the *new* bound”. > > struct foo x; > x=… ; // set the whole structure 1 > char *p = x->p; > x=… ; // set the whole structure 2 > > In the above, when “set the whole structure 1”, x1, x1->n and x1->p are set > at the same time; > After *p = x->p; the pointer “p” is pointing to “x1->p”, it’s bound is > “x1->n”; I agree. > > Then when “set the whole structure 2”, x2 is different than x1, x2->n and > x2->p are set at the same time, the pointer > ‘p’ still points to “x1->p”, therefore it’s bound should be “x1->n”. > > So, as long as the whole structure is set at the same time, should be fine. > > Do I miss anything here? I was talking aout the pointer "p" which was obtained before setting the struct the second time in char *p = x->p; This pointer is still set to x1->p but the bound refers to x.n which is now set to x2->n. Martin > > Qing > > > (And I guess this is why you suggest to potentially > > change it below). > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > 2. This new requirement is ONLY for “counted_by” attribute that is > > > > > attached to the pointer field, not needed > > > > > for flexible array members. > > > > > > > > > > 3. Then there will be inconsistency for the “counted_by” attribute > > > > > between FAM and pointer field. > > > > > > > > > > > > > The major questions I have right now: > > > > > > > > > > 1. Shall we keep this inconsistency between FAM and pointer field? > > > > > Or, > > > > > > > > > > 2. Shall we keep them consistent by adding this new requirement for > > > > > the previous FAM? > > > > > > > > Or have a new attribute? I feel we double down on a bad design > > > > which made sense for FAM addressing a very specific use case > > > > (retrofitting checks to the Linux kernel) but is otherwise not > > > > very strong. > > > > > > I do agree that the previous specific feature of the “counted_by" for the > > > FAM, i.e: > > > " > > > One important feature of the attribute is, a reference to the > > > flexible array member field uses the latest value assigned to the > > > field that represents the number of the elements before that > > > reference. For example, > > > > > > p->count = val1; > > > p->array[20] = 0; // ref1 to p->array > > > p->count = val2; > > > p->array[30] = 0; // ref2 to p->array > > > > > > in the above, 'ref1' uses 'val1' as the number of the elements in > > > 'p->array', and 'ref2' uses 'val2' as the number of elements in > > > 'p->array’. > > > “ > > > > > > Is a specific feature for Linux kernel, I am wondering whether the above > > > feature > > > really needed by the Linux kernel? > > > > > > If Not, I prefer to eliminate this specific feature from GCC before we > > > officially release the “counted_by” > > > attribute in GCC15. > > > > > > If Linux kernel does need this feature, Yes, maybe a new attribute for > > > pointer is better. > > > > > > Kees, could you please also provide more comments and suggestion on this? > > > > > > > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > Kees, the following feature that you requested for the FAM: > > > > > > > > > > " > > > > > One important feature of the attribute is, a reference to the > > > > > flexible array member field uses the latest value assigned to the > > > > > field that represents the number of the elements before that > > > > > reference. For example, > > > > > > > > > > p->count = val1; > > > > > p->array[20] = 0; // ref1 to p->array > > > > > p->count = val2; > > > > > p->array[30] = 0; // ref2 to p->array > > > > > > > > > > in the above, 'ref1' uses 'val1' as the number of the elements in > > > > > 'p->array', and 'ref2' uses 'val2' as the number of elements in > > > > > 'p->array’. > > > > > " > > > > > Has this feature been used by Linux kernel already? > > > > > Is this feature really needed by Linux kernel? > > > > > > > > > > Thanks a lot for suggestions and comments. > > > > > > > > > > Qing > > > > > > > > > > > On Nov 20, 2024, at 14:23, Martin Uecker <uec...@tugraz.at> wrote: > > > > > > > > > > > > Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao: > > > > > > > Hi, Martin, > > > > > > > > > > > > > > Thanks a lot for pointing this out. > > > > > > > > > > > > > > This does look like a problem we need avoid for the pointer > > > > > > > arrays. > > > > > > > > > > > > > > Does the same problem exist in the language extension too if the > > > > > > > n is allowed to be changed after initialization? > > > > > > > > > > > > > > If so, for the future language extension, is there any proposed > > > > > > > solution to this problem? > > > > > > > > > > > > > > > > > > > There is no specification yet and nothing formally proposed, so > > > > > > it is entirely unclear at this point. > > > > > > > > > > > > My idea would be to give 'x->buf' the type '(char(*)[x->n])' > > > > > > where 'x->n' is loaded at the same time 'x->buf' is accessed > > > > > > and then the value is frozen (like in the simpler versions > > > > > > of 'counted_by' you had implemented first). Of course, one > > > > > > would then have to set x->n before setting the buffer (or > > > > > > at the same time). This could be ensured by making the > > > > > > member 'n' const, so that it can only be changed by > > > > > > overwriting the whole struct. But I am still thinking > > > > > > about this. > > > > > > > > > > > > In any case, I think for "counted_by" this is not an option > > > > > > because it would be confusing if it works differently > > > > > > than for the FAM case. > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > Qing > > > > > > > > On Nov 20, 2024, at 10:52, Martin Uecker <uec...@tugraz.at> > > > > > > > > wrote: > > > > > > > > > > > > > > > > Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao: > > > > > > > > > > > > > > > > > > > On Nov 19, 2024, at 10:47, Marek Polacek > > > > > > > > > > <pola...@redhat.com> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker > > > > > > > > > > wrote: > > > > > > > > > > > Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing > > > > > > > > > > > Zhao: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > .. > > > > > > > > > > > > > > > > Hi Qing, > > > > > > > > > > > > > > > > > Per our discussion so far, if treating the following > > > > > > > > > > > > > > > > > > struct foo { > > > > > > > > > int n; > > > > > > > > > char *p __attribute__ ((counted_by (n))); > > > > > > > > > }; > > > > > > > > > > > > > > > > > > as an array with upper-bounds being “n” is reasonable. > > > > > > > > > > > > > > > > There is one issue I want to point out, which I just realized > > > > > > > > during > > > > > > > > a discussion in WG14. For "counted_by" we defined the semantics > > > > > > > > in a way that later store to 'n' will be taken into account. > > > > > > > > We did this to support the use case where 'n' is set after > > > > > > > > the access to 'p'. > > > > > > > > > > > > > > > > struct foo *x = ; > > > > > > > > > > > > > > > > char *q = x->p; > > > > > > > > x->n = 100; // this should apply > > > > > > > > > > > > > > > > > > > > > > > > For FAMs this is fine, because it is a fixed part > > > > > > > > of the struct. But for the new pointer case, I think this is > > > > > > > > problematic. Consider this example: > > > > > > > > > > > > > > > > struct foo *x = allocate_buffer(100); > > > > > > > > > > > > > > > > where x->n is set to the right value in the allocation function. > > > > > > > > > > > > > > > > Now let's continue with > > > > > > > > > > > > > > > > char *q = x->p; > > > > > > > > > > > > > > > > x = allocate_buffer(50); > > > > > > > > // x->n is set to 50. > > > > > > > > > > > > > > > > Now q refers to the old buffer, but x->n to the size of the new > > > > > > > > buffer. That does not seem right and scares me a little bit. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then, it’s reasonable to extend -fsanitize=bounds to > > > > > > > > > instrument the corresponding reference for the pointer with > > > > > > > > > Counted-by attribute. > > > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > Qing > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think the question is what -fsanitize=bounds is meant > > > > > > > > > > > to be. > > > > > > > > > > > > > > > > > > > > > > I am a bit frustrated about the sanitizer. On the > > > > > > > > > > > one hand, it is not doing enough to get spatial memory > > > > > > > > > > > safety even where this would be easily possible, on the > > > > > > > > > > > other hand, is pedantic about things which are technically > > > > > > > > > > > UB but not problematic and then one is prevented from > > > > > > > > > > > using it > > > > > > > > > > > > > > > > > > > > > > When used in default mode, where execution continues, it > > > > > > > > > > > also does not mix well with many warning, creates more > > > > > > > > > > > code, > > > > > > > > > > > and pulls in a libary dependency (and the library also > > > > > > > > > > > depends > > > > > > > > > > > on upstream choices / progress which seems a limitation > > > > > > > > > > > for > > > > > > > > > > > extensions). > > > > > > > > > > > > > > > > > > > > > > What IMHO would be ideal is a protection mode for spatial > > > > > > > > > > > memory safety that simply adds traps (which then requires > > > > > > > > > > > no library, has no issues with other warnings, and could > > > > > > > > > > > evolve independently from clang) > > > > > > > > > > > > > > > > > > > > > > So shouldn't we just add a -fboundscheck (which would > > > > > > > > > > > be like -fsanitize=bounds -fsanitize-trap=bounds just with > > > > > > > > > > > more checking) and make it really good? I think many > > > > > > > > > > > people > > > > > > > > > > > would be very happy about this. > > > > > > > > > > > > > > > > > > > > That's a separate concern. We already have the > > > > > > > > > > -fbounds-check option, > > > > > > > > > > currently only used in Fortran (and D?), so perhaps we > > > > > > > > > > could make > > > > > > > > > > that option a shorthand for -fsanitize=bounds > > > > > > > > > > -fsanitize-trap=bounds. > > > > > > > > > > > > > > > > > > > > Marek > >