[Bug tree-optimization/120980] Vectorizer (early exit) introduces out-of-bounds memory access

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 08 Jul 2025 00:27:26 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120980


--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Krister Walfridsson from comment #3)
> (In reply to Richard Biener from comment #2)
> > We do not want to introduce additional code generation overhead to avoid a
> > fully out-of-bounds but known not trapping because it's known to fall into
> > the same page as a previous partly not out-of-bounds access.  But I realize
> > this altered constraint might be difficult to implement in the
> > analysis tool?
> 
> It is easy to modify smtgcc's in-range check to treat loads as valid if the
> loaded bytes are within the same page as bytes of the object identified by
> the pointer's provenance.
> 
> But this is far too permissive, and it's easy to construct examples that get
> miscompiled under that semantic. So we must add additional restrictions on
> fully out-of-bounds loads. But I have no idea what kind of restrictions
> would make sense...
> 
> And there are more problems. For example, how does a fully out-of-bounds
> load interact with pointer provenance? The out-of-bounds bytes do not
> correspond to the provenance and may even belong to a different object, so
> the provenance check will still classify the load as UB, even if we relax
> the in-range check.

Isn't provenance determined by the restrictions on pointer arithmetic?  That
is,
a pointer adjustment (or implied address calculation from a memory
access) that gets us outside of the object the pointer base points to would
still have that original provenance.

But sure, for your testcase it's very difficult to distinguish the case
where the compiler did sth wrong from the case where correctness depends
on the actual input - with a2[2] = { 0, 0 } the original scalar code would
already access out-of-bound elements.

Maybe if we say that if there's a in-bound or partial in-bound access
then an access that's based on the same (or advanced) pointer is OK,
even if fully out-of-bounds, if it is within the same page (based on
alignment knowledge)?  That is, somehow treat the separate accesses as
one?


One possible (but with runtime cost) solution would be to duplicate the
early exit vector condition as we duplicate the loads (and for SMT make
sure to sink them after the first early exit).  Like

  vect__4.19_93 = MEM <vector(2) long int> [(long int *)vectp_p1.17_91];
  vect__6.15_85 = MEM <vector(2) long int> [(long int *)vectp_p2.13_83];
  mask_patt_7.21_96 = vect__6.15_85 != vect__4.19_93;
  if (mask_patt_7.21_96 != { 0, 0 })
    goto <bb 29>; [5.50%]
  else
    goto <bb 5>; [94.50%]

  <bb 29>:
  vectp_p1.17_94 = vectp_p1.17_91 + 16;
  vect__4.20_95 = MEM <vector(2) long int> [(long int *)vectp_p1.17_94];
  vectp_p2.13_86 = vectp_p2.13_83 + 16;
  vect__6.16_87 = MEM <vector(2) long int> [(long int *)vectp_p2.13_86];
  mask_patt_7.21_97 = vect__6.16_87 != vect__4.20_95;
  if (mask_patt_7.21_97 != { 0, 0 })
    goto <bb 28>; [5.50%]
  else
    goto <bb 5>; [94.50%]

but as said, this comes at nontrivial cost.

Now, the testcase shows a missed optimization - we are unnecessarily
using a large VF because of

t.c:2:21: note:   ==> examining phi: ivtmp_21 = PHI <ivtmp_20(7), 8(2)>
t.c:2:21: note:   get vectype for scalar type:  unsigned int
t.c:2:21: note:   vectype: vector(8) unsigned int

and this causes the duplication in the first place.  If we solve this
issue the issue will appear less often (but nothing prevents it in
principle - we just need a "real" reason to have a smaller data type
involved).  I believe we have a bugreport for this already (using
vector inductions to get the scalar on-exits IVs).

[Bug tree-optimization/120980] Vectorizer (early exit) introduces out-of-bounds memory access

Reply via email to