https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120980
Bug ID: 120980 Summary: Vectorizer introduces out-of-bounds memory access Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The vectorizer introduces out-of-bounds memory access for the function below when compiled for AArch64 with "-O3 -march=armv9.5-a -fno-strict-aliasing". void foo(long *p1, long *p2) { for (int i = 0; i < 8; i++) if (p1[i] != p2[i]) __builtin_exit(0); } The vectorized code checks if p1 and p2 are 32-byte aligned, and in that case processes 4 elements at a time: vect__4.19_93 = MEM <vector(2) long int> [(long int *)vectp_p1.17_91]; vectp_p1.17_94 = vectp_p1.17_91 + 16; vect__4.20_95 = MEM <vector(2) long int> [(long int *)vectp_p1.17_94]; vect__6.15_85 = MEM <vector(2) long int> [(long int *)vectp_p2.13_83]; vectp_p2.13_86 = vectp_p2.13_83 + 16; vect__6.16_87 = MEM <vector(2) long int> [(long int *)vectp_p2.13_86]; mask_patt_7.21_96 = vect__6.15_85 != vect__4.19_93; mask_patt_7.21_97 = vect__6.16_87 != vect__4.20_95; vexit_reduc_98 = mask_patt_7.21_97 | mask_patt_7.21_96; if (vexit_reduc_98 != { 0, 0 }) goto <bb 28>; [5.50%] else goto <bb 5>; [94.50%] The problem occurs if the input is smaller, such as long a1[8] = {0, 0, 0, 0, 0, 0, 0, 0}; long a2[2] = {0, 1}; foo(a1, a2); This makes vect__6.16_87 be loaded from outside the array, which is harmless as the bytes are within the same page, but it is invalid to do fully out-of-bounds accesses according to the discussion in https://gcc.gnu.org/pipermail/gcc/2025-April/245873.html The IR would have been valid if the loads were done as one 4-element load instead of two 2-element loads, as partly out-of-bounds accesses are valid.