https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120980
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Krister Walfridsson from comment #3) > (In reply to Richard Biener from comment #2) > > We do not want to introduce additional code generation overhead to avoid a > > fully out-of-bounds but known not trapping because it's known to fall into > > the same page as a previous partly not out-of-bounds access. But I realize > > this altered constraint might be difficult to implement in the > > analysis tool? > > It is easy to modify smtgcc's in-range check to treat loads as valid if the > loaded bytes are within the same page as bytes of the object identified by > the pointer's provenance. > > But this is far too permissive, and it's easy to construct examples that get > miscompiled under that semantic. So we must add additional restrictions on > fully out-of-bounds loads. But I have no idea what kind of restrictions > would make sense... > > And there are more problems. For example, how does a fully out-of-bounds > load interact with pointer provenance? The out-of-bounds bytes do not > correspond to the provenance and may even belong to a different object, so > the provenance check will still classify the load as UB, even if we relax > the in-range check. Isn't provenance determined by the restrictions on pointer arithmetic? That is, a pointer adjustment (or implied address calculation from a memory access) that gets us outside of the object the pointer base points to would still have that original provenance. But sure, for your testcase it's very difficult to distinguish the case where the compiler did sth wrong from the case where correctness depends on the actual input - with a2[2] = { 0, 0 } the original scalar code would already access out-of-bound elements. Maybe if we say that if there's a in-bound or partial in-bound access then an access that's based on the same (or advanced) pointer is OK, even if fully out-of-bounds, if it is within the same page (based on alignment knowledge)? That is, somehow treat the separate accesses as one? One possible (but with runtime cost) solution would be to duplicate the early exit vector condition as we duplicate the loads (and for SMT make sure to sink them after the first early exit). Like vect__4.19_93 = MEM <vector(2) long int> [(long int *)vectp_p1.17_91]; vect__6.15_85 = MEM <vector(2) long int> [(long int *)vectp_p2.13_83]; mask_patt_7.21_96 = vect__6.15_85 != vect__4.19_93; if (mask_patt_7.21_96 != { 0, 0 }) goto <bb 29>; [5.50%] else goto <bb 5>; [94.50%] <bb 29>: vectp_p1.17_94 = vectp_p1.17_91 + 16; vect__4.20_95 = MEM <vector(2) long int> [(long int *)vectp_p1.17_94]; vectp_p2.13_86 = vectp_p2.13_83 + 16; vect__6.16_87 = MEM <vector(2) long int> [(long int *)vectp_p2.13_86]; mask_patt_7.21_97 = vect__6.16_87 != vect__4.20_95; if (mask_patt_7.21_97 != { 0, 0 }) goto <bb 28>; [5.50%] else goto <bb 5>; [94.50%] but as said, this comes at nontrivial cost. Now, the testcase shows a missed optimization - we are unnecessarily using a large VF because of t.c:2:21: note: ==> examining phi: ivtmp_21 = PHI <ivtmp_20(7), 8(2)> t.c:2:21: note: get vectype for scalar type: unsigned int t.c:2:21: note: vectype: vector(8) unsigned int and this causes the duplication in the first place. If we solve this issue the issue will appear less often (but nothing prevents it in principle - we just need a "real" reason to have a smaller data type involved). I believe we have a bugreport for this already (using vector inductions to get the scalar on-exits IVs).