Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao:
> Think this over these days, I have another thought that need some feedback:
>
> The major issue right now is:
>
> 1. For the following structure in which the “counted_by” attributes is
> attached to the pointer field.
>
> struct foo {
> int n;
> char *p __attribute__ ((counted_by (n)));
> } *x;
BTW: I also do not like the syntax 'n' for a lookup of a member
in the member namespace. I think it should be '.n'. For FAM this
is less problematic because it always at the end, but here it is
problematic.
>
> There is one important additional requirement:
>
> x->n, x->p can ONLY be changed by changing the whole structure at the same
> time.
> Otherwise, x->n might not be consistent with x->p.
By itself, this would still not fix the issue I pointed out.
struct foo x;
x = .. ; // set the whole structure
char *p = x->p;
x = ... ; // set the whole structure
What is the bound for 'p' ?
With current rules it would be the old bound.
>
> 2. This new requirement is ONLY for “counted_by” attribute that is attached
> to the pointer field, not needed
> for flexible array members.
>
> 3. Then there will be inconsistency for the “counted_by” attribute between
> FAM and pointer field.
> The major questions I have right now:
>
> 1. Shall we keep this inconsistency between FAM and pointer field?
> Or,
>
> 2. Shall we keep them consistent by adding this new requirement for the
> previous FAM?
Or have a new attribute? I feel we double down on a bad design
which made sense for FAM addressing a very specific use case
(retrofitting checks to the Linux kernel) but is otherwise not
very strong.
Martin
>
> Kees, the following feature that you requested for the FAM:
>
> "
> One important feature of the attribute is, a reference to the
> flexible array member field uses the latest value assigned to the
> field that represents the number of the elements before that
> reference. For example,
>
> p->count = val1;
> p->array[20] = 0; // ref1 to p->array
> p->count = val2;
> p->array[30] = 0; // ref2 to p->array
>
> in the above, 'ref1' uses 'val1' as the number of the elements in
> 'p->array', and 'ref2' uses 'val2' as the number of elements in
> 'p->array’.
> "
> Has this feature been used by Linux kernel already?
> Is this feature really needed by Linux kernel?
>
> Thanks a lot for suggestions and comments.
>
> Qing
>
> > On Nov 20, 2024, at 14:23, Martin Uecker <[email protected]> wrote:
> >
> > Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao:
> > > Hi, Martin,
> > >
> > > Thanks a lot for pointing this out.
> > >
> > > This does look like a problem we need avoid for the pointer arrays.
> > >
> > > Does the same problem exist in the language extension too if the n is
> > > allowed to be changed after initialization?
> > >
> > > If so, for the future language extension, is there any proposed solution
> > > to this problem?
> > >
> >
> > There is no specification yet and nothing formally proposed, so
> > it is entirely unclear at this point.
> >
> > My idea would be to give 'x->buf' the type '(char(*)[x->n])'
> > where 'x->n' is loaded at the same time 'x->buf' is accessed
> > and then the value is frozen (like in the simpler versions
> > of 'counted_by' you had implemented first). Of course, one
> > would then have to set x->n before setting the buffer (or
> > at the same time). This could be ensured by making the
> > member 'n' const, so that it can only be changed by
> > overwriting the whole struct. But I am still thinking
> > about this.
> >
> > In any case, I think for "counted_by" this is not an option
> > because it would be confusing if it works differently
> > than for the FAM case.
> >
> > Martin
> >
> >
> > > Qing
> > > > On Nov 20, 2024, at 10:52, Martin Uecker <[email protected]> wrote:
> > > >
> > > > Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao:
> > > > >
> > > > > > On Nov 19, 2024, at 10:47, Marek Polacek <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> > > > > > > Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao:
> > > > > > > > Hi,
> > > > >
> > > > ..
> > > >
> > > > Hi Qing,
> > > >
> > > > > Per our discussion so far, if treating the following
> > > > >
> > > > > struct foo {
> > > > > int n;
> > > > > char *p __attribute__ ((counted_by (n)));
> > > > > };
> > > > >
> > > > > as an array with upper-bounds being “n” is reasonable.
> > > >
> > > > There is one issue I want to point out, which I just realized during
> > > > a discussion in WG14. For "counted_by" we defined the semantics
> > > > in a way that later store to 'n' will be taken into account.
> > > > We did this to support the use case where 'n' is set after
> > > > the access to 'p'.
> > > >
> > > > struct foo *x = ;
> > > >
> > > > char *q = x->p;
> > > > x->n = 100; // this should apply
> > > >
> > > >
> > > > For FAMs this is fine, because it is a fixed part
> > > > of the struct. But for the new pointer case, I think this is
> > > > problematic. Consider this example:
> > > >
> > > > struct foo *x = allocate_buffer(100);
> > > >
> > > > where x->n is set to the right value in the allocation function.
> > > >
> > > > Now let's continue with
> > > >
> > > > char *q = x->p;
> > > >
> > > > x = allocate_buffer(50);
> > > > // x->n is set to 50.
> > > >
> > > > Now q refers to the old buffer, but x->n to the size of the new
> > > > buffer. That does not seem right and scares me a little bit.
> > > >
> > > >
> > > >
> > > > Martin
> > > >
> > > >
> > > > >
> > > > > Then, it’s reasonable to extend -fsanitize=bounds to instrument the
> > > > > corresponding reference for the pointer with
> > > > > Counted-by attribute.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Qing
> > > > >
> > > > > >
> > > > > > > I think the question is what -fsanitize=bounds is meant to be.
> > > > > > >
> > > > > > > I am a bit frustrated about the sanitizer. On the
> > > > > > > one hand, it is not doing enough to get spatial memory
> > > > > > > safety even where this would be easily possible, on the
> > > > > > > other hand, is pedantic about things which are technically
> > > > > > > UB but not problematic and then one is prevented from
> > > > > > > using it
> > > > > > >
> > > > > > > When used in default mode, where execution continues, it
> > > > > > > also does not mix well with many warning, creates more code,
> > > > > > > and pulls in a libary dependency (and the library also depends
> > > > > > > on upstream choices / progress which seems a limitation for
> > > > > > > extensions).
> > > > > > >
> > > > > > > What IMHO would be ideal is a protection mode for spatial
> > > > > > > memory safety that simply adds traps (which then requires
> > > > > > > no library, has no issues with other warnings, and could
> > > > > > > evolve independently from clang)
> > > > > > >
> > > > > > > So shouldn't we just add a -fboundscheck (which would
> > > > > > > be like -fsanitize=bounds -fsanitize-trap=bounds just with
> > > > > > > more checking) and make it really good? I think many people
> > > > > > > would be very happy about this.
> > > > > >
> > > > > > That's a separate concern. We already have the -fbounds-check
> > > > > > option,
> > > > > > currently only used in Fortran (and D?), so perhaps we could make
> > > > > > that option a shorthand for -fsanitize=bounds
> > > > > > -fsanitize-trap=bounds.
> > > > > >
> > > > > > Marek
> > > > > >
> > > > >
> > > >
> > >
> >
>