Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao: > Think this over these days, I have another thought that need some feedback: > > The major issue right now is: > > 1. For the following structure in which the “counted_by” attributes is > attached to the pointer field. > > struct foo { > int n; > char *p __attribute__ ((counted_by (n))); > } *x;
BTW: I also do not like the syntax 'n' for a lookup of a member in the member namespace. I think it should be '.n'. For FAM this is less problematic because it always at the end, but here it is problematic. > > There is one important additional requirement: > > x->n, x->p can ONLY be changed by changing the whole structure at the same > time. > Otherwise, x->n might not be consistent with x->p. By itself, this would still not fix the issue I pointed out. struct foo x; x = .. ; // set the whole structure char *p = x->p; x = ... ; // set the whole structure What is the bound for 'p' ? With current rules it would be the old bound. > > 2. This new requirement is ONLY for “counted_by” attribute that is attached > to the pointer field, not needed > for flexible array members. > > 3. Then there will be inconsistency for the “counted_by” attribute between > FAM and pointer field. > The major questions I have right now: > > 1. Shall we keep this inconsistency between FAM and pointer field? > Or, > > 2. Shall we keep them consistent by adding this new requirement for the > previous FAM? Or have a new attribute? I feel we double down on a bad design which made sense for FAM addressing a very specific use case (retrofitting checks to the Linux kernel) but is otherwise not very strong. Martin > > Kees, the following feature that you requested for the FAM: > > " > One important feature of the attribute is, a reference to the > flexible array member field uses the latest value assigned to the > field that represents the number of the elements before that > reference. For example, > > p->count = val1; > p->array[20] = 0; // ref1 to p->array > p->count = val2; > p->array[30] = 0; // ref2 to p->array > > in the above, 'ref1' uses 'val1' as the number of the elements in > 'p->array', and 'ref2' uses 'val2' as the number of elements in > 'p->array’. > " > Has this feature been used by Linux kernel already? > Is this feature really needed by Linux kernel? > > Thanks a lot for suggestions and comments. > > Qing > > > On Nov 20, 2024, at 14:23, Martin Uecker <uec...@tugraz.at> wrote: > > > > Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao: > > > Hi, Martin, > > > > > > Thanks a lot for pointing this out. > > > > > > This does look like a problem we need avoid for the pointer arrays. > > > > > > Does the same problem exist in the language extension too if the n is > > > allowed to be changed after initialization? > > > > > > If so, for the future language extension, is there any proposed solution > > > to this problem? > > > > > > > There is no specification yet and nothing formally proposed, so > > it is entirely unclear at this point. > > > > My idea would be to give 'x->buf' the type '(char(*)[x->n])' > > where 'x->n' is loaded at the same time 'x->buf' is accessed > > and then the value is frozen (like in the simpler versions > > of 'counted_by' you had implemented first). Of course, one > > would then have to set x->n before setting the buffer (or > > at the same time). This could be ensured by making the > > member 'n' const, so that it can only be changed by > > overwriting the whole struct. But I am still thinking > > about this. > > > > In any case, I think for "counted_by" this is not an option > > because it would be confusing if it works differently > > than for the FAM case. > > > > Martin > > > > > > > Qing > > > > On Nov 20, 2024, at 10:52, Martin Uecker <uec...@tugraz.at> wrote: > > > > > > > > Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao: > > > > > > > > > > > On Nov 19, 2024, at 10:47, Marek Polacek <pola...@redhat.com> wrote: > > > > > > > > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote: > > > > > > > Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao: > > > > > > > > Hi, > > > > > > > > > .. > > > > > > > > Hi Qing, > > > > > > > > > Per our discussion so far, if treating the following > > > > > > > > > > struct foo { > > > > > int n; > > > > > char *p __attribute__ ((counted_by (n))); > > > > > }; > > > > > > > > > > as an array with upper-bounds being “n” is reasonable. > > > > > > > > There is one issue I want to point out, which I just realized during > > > > a discussion in WG14. For "counted_by" we defined the semantics > > > > in a way that later store to 'n' will be taken into account. > > > > We did this to support the use case where 'n' is set after > > > > the access to 'p'. > > > > > > > > struct foo *x = ; > > > > > > > > char *q = x->p; > > > > x->n = 100; // this should apply > > > > > > > > > > > > For FAMs this is fine, because it is a fixed part > > > > of the struct. But for the new pointer case, I think this is > > > > problematic. Consider this example: > > > > > > > > struct foo *x = allocate_buffer(100); > > > > > > > > where x->n is set to the right value in the allocation function. > > > > > > > > Now let's continue with > > > > > > > > char *q = x->p; > > > > > > > > x = allocate_buffer(50); > > > > // x->n is set to 50. > > > > > > > > Now q refers to the old buffer, but x->n to the size of the new > > > > buffer. That does not seem right and scares me a little bit. > > > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > Then, it’s reasonable to extend -fsanitize=bounds to instrument the > > > > > corresponding reference for the pointer with > > > > > Counted-by attribute. > > > > > > > > > > What do you think? > > > > > > > > > > Qing > > > > > > > > > > > > > > > > > > I think the question is what -fsanitize=bounds is meant to be. > > > > > > > > > > > > > > I am a bit frustrated about the sanitizer. On the > > > > > > > one hand, it is not doing enough to get spatial memory > > > > > > > safety even where this would be easily possible, on the > > > > > > > other hand, is pedantic about things which are technically > > > > > > > UB but not problematic and then one is prevented from > > > > > > > using it > > > > > > > > > > > > > > When used in default mode, where execution continues, it > > > > > > > also does not mix well with many warning, creates more code, > > > > > > > and pulls in a libary dependency (and the library also depends > > > > > > > on upstream choices / progress which seems a limitation for > > > > > > > extensions). > > > > > > > > > > > > > > What IMHO would be ideal is a protection mode for spatial > > > > > > > memory safety that simply adds traps (which then requires > > > > > > > no library, has no issues with other warnings, and could > > > > > > > evolve independently from clang) > > > > > > > > > > > > > > So shouldn't we just add a -fboundscheck (which would > > > > > > > be like -fsanitize=bounds -fsanitize-trap=bounds just with > > > > > > > more checking) and make it really good? I think many people > > > > > > > would be very happy about this. > > > > > > > > > > > > That's a separate concern. We already have the -fbounds-check > > > > > > option, > > > > > > currently only used in Fortran (and D?), so perhaps we could make > > > > > > that option a shorthand for -fsanitize=bounds > > > > > > -fsanitize-trap=bounds. > > > > > > > > > > > > Marek > > > > > > > > > > > > > > > > > > > > >