https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120980

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 8 Jul 2025, kristerw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120980
> 
> --- Comment #6 from Krister Walfridsson <kristerw at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #4)
> > (In reply to Krister Walfridsson from comment #3)
> > > (In reply to Richard Biener from comment #2)
> > > And there are more problems. For example, how does a fully out-of-bounds
> > > load interact with pointer provenance? The out-of-bounds bytes do not
> > > correspond to the provenance and may even belong to a different object, so
> > > the provenance check will still classify the load as UB, even if we relax
> > > the in-range check.
> > 
> > Isn't provenance determined by the restrictions on pointer arithmetic?  That
> > is,
> > a pointer adjustment (or implied address calculation from a memory
> > access) that gets us outside of the object the pointer base points to would
> > still have that original provenance.
> > 
> > But sure, for your testcase it's very difficult to distinguish the case
> > where the compiler did sth wrong from the case where correctness depends
> > on the actual input - with a2[2] = { 0, 0 } the original scalar code would
> > already access out-of-bound elements.
> > 
> > Maybe if we say that if there's a in-bound or partial in-bound access
> > then an access that's based on the same (or advanced) pointer is OK,
> > even if fully out-of-bounds, if it is within the same page (based on
> > alignment knowledge)?  That is, somehow treat the separate accesses as
> > one?
> 
> Provenance is more important for the GIMPLE semantic than it is for C.
> 
> It is UB in C to perform pointer arithmetic that moves a pointer outside of an
> object (with a special case for "one past"), so for
> 
>   char arr1[1000];
> 
>   char *p = arr1;
>   char *q = p + i;
> 
> it's UB in C if (i < 0 || i > 1000). That means if we see a memory access *q,
> we know it must refer to arr1 (or the "one past" case), even without
> considering provenance.
> 
> In GIMPLE, where pointers may point outside their original objects, it's the
> provenance restrictions that ensure *q stays within the intended object. In
> other words, *q may access _any_ valid memory unless the provenance check
> blocks it by making the access UB!
> 
> Assume we have a second array arr2 placed directly after arr1 in memory.
> Without provenance restrictions,
> 
>   char sum = *p + *q;
> 
> has defined semantics even for values such as i = 1100, where it would now be
> required to produce the same result as arr1[0] + arr2[100]. This would
> significantly constrain most memory optimizations...
> 
> But when allowing fully out-of-bounds loads, we cannot really distinguish
> whether *p + *q represents arr1[0] + arr1[i], where provenance should be
> checked to make it UB if *q would access another object, or arr1[0] +
> *out-of-bounds-access, where we must ignore provenance to allow the load.

Hmm, I don't quite follow the difference between GIMPLE and C here.
For both to get from one to another object you have to go via
integer arithmetic, not pointer arithmetic.

There's an additional twist due to how pointer analysis works in GIMPLE.
If you have

 char arr1[100], arr2[100];

you cannot go from arr1 to &arr2[10] by doing just

 char *p = (char *)(uintptr_t)&arr1[100] + 10;

but you _can_ (also more reliably) by using

 intptr_t diff = &arr1[0] - &arr2[0];
 char *p = (char *)((uintptr_t)&arr1[0] - diff) + 10;

which is because pointer analysis tracks provenance across casts to/from
integers.  To pointer analysis 'diff' has provenance arr1 or arr2 and
thus &arr1[0] - diff has both of them as well, so it's now valid for
'p' to access both arr1 or arr2.

I realize this isn't how the provenance proposal for the C language
standard works.

That said, with your example above for both C and GIMPLE the
provenance of 'q' stays at 'arr1'.  Specifically a POINTER_PLUS_EXPR
does never change provenance.

That of course makes the vectorizer behavior bad again if the
pointer now hits an unrelated object (but with your verifier, how
do you "place" objects so that you ever possibly reach one from
another?)

Reply via email to