https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120980

--- Comment #6 from Krister Walfridsson <kristerw at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> (In reply to Krister Walfridsson from comment #3)
> > (In reply to Richard Biener from comment #2)
> > And there are more problems. For example, how does a fully out-of-bounds
> > load interact with pointer provenance? The out-of-bounds bytes do not
> > correspond to the provenance and may even belong to a different object, so
> > the provenance check will still classify the load as UB, even if we relax
> > the in-range check.
> 
> Isn't provenance determined by the restrictions on pointer arithmetic?  That
> is,
> a pointer adjustment (or implied address calculation from a memory
> access) that gets us outside of the object the pointer base points to would
> still have that original provenance.
> 
> But sure, for your testcase it's very difficult to distinguish the case
> where the compiler did sth wrong from the case where correctness depends
> on the actual input - with a2[2] = { 0, 0 } the original scalar code would
> already access out-of-bound elements.
> 
> Maybe if we say that if there's a in-bound or partial in-bound access
> then an access that's based on the same (or advanced) pointer is OK,
> even if fully out-of-bounds, if it is within the same page (based on
> alignment knowledge)?  That is, somehow treat the separate accesses as
> one?

Provenance is more important for the GIMPLE semantic than it is for C.

It is UB in C to perform pointer arithmetic that moves a pointer outside of an
object (with a special case for "one past"), so for

  char arr1[1000];

  char *p = arr1;
  char *q = p + i;

it's UB in C if (i < 0 || i > 1000). That means if we see a memory access *q,
we know it must refer to arr1 (or the "one past" case), even without
considering provenance.

In GIMPLE, where pointers may point outside their original objects, it's the
provenance restrictions that ensure *q stays within the intended object. In
other words, *q may access _any_ valid memory unless the provenance check
blocks it by making the access UB!

Assume we have a second array arr2 placed directly after arr1 in memory.
Without provenance restrictions,

  char sum = *p + *q;

has defined semantics even for values such as i = 1100, where it would now be
required to produce the same result as arr1[0] + arr2[100]. This would
significantly constrain most memory optimizations...

But when allowing fully out-of-bounds loads, we cannot really distinguish
whether *p + *q represents arr1[0] + arr1[i], where provenance should be
checked to make it UB if *q would access another object, or arr1[0] +
*out-of-bounds-access, where we must ignore provenance to allow the load.

Reply via email to