Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

Martin Uecker Sat, 30 Nov 2024 04:23:09 -0800

Am Dienstag, dem 26.11.2024 um 20:59 +0000 schrieb Qing Zhao:
> Think this over these days, I have another thought that need some feedback:
> 
> The major issue right now is:
> 
> 1. For the following structure in which the “counted_by” attributes is 
> attached to the pointer field.
> 
> struct foo {
>   int n;
>   char *p __attribute__ ((counted_by (n)));
> } *x;


BTW: I also do not like the syntax 'n' for a lookup of a member 
in the member namespace. I think it should be '.n'.  For FAM this 
is less problematic because it always at the end, but here it is 
problematic.
> 
> There is one important additional requirement:
> 
> x->n, x->p can ONLY be changed by changing the whole structure at the same 
> time. 
> Otherwise, x->n might not be consistent with x->p.

By itself, this would still not fix the issue I pointed out.

struct foo x;
x = .. ; // set the whole structure
char *p = x->p;
x = ... ; // set the whole structure

What is the bound for 'p' ?  
With current rules it would be the old bound.


> 
> 2. This new requirement is ONLY for “counted_by” attribute that is attached 
> to the pointer field, not needed
> for flexible array members.
> 
> 3. Then there will be inconsistency for the “counted_by” attribute between 
> FAM and pointer field.


> The major questions I have right now:
> 
> 1. Shall we keep this inconsistency between FAM and pointer field? 
> Or, 
> 
> 2. Shall we keep them consistent by adding this new requirement for the 
> previous FAM? 

Or have a new attribute?  I feel we double down on a bad design
which made sense for FAM addressing a very specific use case
(retrofitting checks to the Linux kernel) but is otherwise not
very strong.

Martin

> 
> Kees, the following feature that you requested for the FAM:
> 
> "
>      One important feature of the attribute is, a reference to the
>      flexible array member field uses the latest value assigned to the
>      field that represents the number of the elements before that
>      reference.  For example,
> 
>             p->count = val1;
>             p->array[20] = 0;  // ref1 to p->array
>             p->count = val2;
>             p->array[30] = 0;  // ref2 to p->array
> 
>      in the above, 'ref1' uses 'val1' as the number of the elements in
>      'p->array', and 'ref2' uses 'val2' as the number of elements in
>      'p->array’.
> "
> Has this feature been used by Linux kernel already?
> Is this feature really needed by Linux kernel?
> 
> Thanks a lot for suggestions and comments.
> 
> Qing
> 
> > On Nov 20, 2024, at 14:23, Martin Uecker <uec...@tugraz.at> wrote:
> > 
> > Am Mittwoch, dem 20.11.2024 um 17:37 +0000 schrieb Qing Zhao:
> > > Hi, Martin,
> > > 
> > > Thanks a lot for pointing this out. 
> > > 
> > > This does look like a problem we need avoid for the pointer arrays.
> > > 
> > > Does  the same problem exist in the language extension too if the n is 
> > > allowed to be changed after initialization?
> > > 
> > > If so, for the future language extension, is there any proposed solution 
> > > to this problem? 
> > > 
> > 
> > There is no specification yet and nothing formally proposed, so
> > it is entirely unclear at this point.
> > 
> > My idea would be to give 'x->buf' the type '(char(*)[x->n])'
> > where 'x->n' is loaded at the same time 'x->buf' is accessed
> > and then the value is frozen (like in the simpler versions
> > of 'counted_by' you had implemented first).  Of course, one
> > would then have to set x->n before setting the buffer (or
> > at the same time). This could be ensured by making the
> > member 'n' const, so that it can only be changed by
> > overwriting the whole struct. But I am still thinking
> > about this.
> > 
> > In any case, I think for "counted_by" this is not an option
> > because it would be confusing if it works differently
> > than for the FAM case. 
> > 
> > Martin
> > 
> > 
> > > Qing
> > > > On Nov 20, 2024, at 10:52, Martin Uecker <uec...@tugraz.at> wrote:
> > > > 
> > > > Am Mittwoch, dem 20.11.2024 um 15:27 +0000 schrieb Qing Zhao:
> > > > > 
> > > > > > On Nov 19, 2024, at 10:47, Marek Polacek <pola...@redhat.com> wrote:
> > > > > > 
> > > > > > On Mon, Nov 18, 2024 at 07:10:35PM +0100, Martin Uecker wrote:
> > > > > > > Am Montag, dem 18.11.2024 um 17:55 +0000 schrieb Qing Zhao:
> > > > > > > > Hi,
> > > > > 
> > > > ..
> > > > 
> > > > Hi Qing,
> > > > 
> > > > > Per our discussion so far, if treating the following
> > > > > 
> > > > > struct foo {
> > > > > int n;
> > > > > char *p __attribute__ ((counted_by (n)));
> > > > > };
> > > > > 
> > > > > as an array with upper-bounds being “n” is reasonable.
> > > > 
> > > > There is one issue I want to point out, which I just realized during
> > > > a discussion in WG14.  For "counted_by" we defined the semantics
> > > > in a way that later store to 'n' will be taken into account. 
> > > > We did this to support the use case where 'n' is set after
> > > > the access to 'p'.
> > > > 
> > > > struct foo *x = ;
> > > > 
> > > > char *q = x->p;
> > > > x->n = 100; // this should apply
> > > > 
> > > > 
> > > > For FAMs this is fine, because it is a fixed part
> > > > of the struct.  But for the new pointer case, I think this is
> > > > problematic.  Consider this example:
> > > > 
> > > > struct foo *x = allocate_buffer(100);
> > > > 
> > > > where x->n is set to the right value in the allocation function.
> > > > 
> > > > Now let's continue with
> > > > 
> > > > char *q = x->p;
> > > > 
> > > > x = allocate_buffer(50);
> > > > // x->n is set to 50.
> > > > 
> > > > Now q refers to the old buffer, but x->n to the size of the new
> > > > buffer.  That does not seem right and scares me a little bit.
> > > > 
> > > > 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > > 
> > > > > Then, it’s reasonable to extend -fsanitize=bounds to instrument the 
> > > > > corresponding reference for the pointer with
> > > > > Counted-by attribute. 
> > > > > 
> > > > > What do you think?
> > > > > 
> > > > > Qing
> > > > > 
> > > > > > 
> > > > > > > I think the question is what -fsanitize=bounds is meant to be.
> > > > > > > 
> > > > > > > I am a bit frustrated about the sanitizer.  On the
> > > > > > > one hand, it is not doing enough to get spatial memory
> > > > > > > safety even where this would be easily possible, on the
> > > > > > > other hand, is pedantic about things which are technically
> > > > > > > UB but not problematic and then one is prevented from
> > > > > > > using it
> > > > > > > 
> > > > > > > When used in default mode, where execution continues, it
> > > > > > > also does not mix well with many warning, creates more code,
> > > > > > > and pulls in a libary dependency (and the library also depends
> > > > > > > on upstream choices / progress which seems a limitation for
> > > > > > > extensions).
> > > > > > > 
> > > > > > > What IMHO would be ideal is a protection mode for spatial
> > > > > > > memory safety that simply adds traps (which then requires
> > > > > > > no library, has no issues with other warnings, and could
> > > > > > > evolve independently from clang) 
> > > > > > > 
> > > > > > > So shouldn't we just add a -fboundscheck (which would 
> > > > > > > be like -fsanitize=bounds -fsanitize-trap=bounds just with
> > > > > > > more checking) and make it really good? I think many people
> > > > > > > would be very happy about this.
> > > > > > 
> > > > > > That's a separate concern.  We already have the -fbounds-check 
> > > > > > option,
> > > > > > currently only used in Fortran (and D?), so perhaps we could make
> > > > > > that option a shorthand for -fsanitize=bounds 
> > > > > > -fsanitize-trap=bounds.
> > > > > > 
> > > > > > Marek
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
>

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

Reply via email to