https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118558
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- So we have <bb 2> [local count: 29527901]: <bb 3> [local count: 118111600]: # g_1168_24 = PHI <g_1168_14(9), 3(2)> # ivtmp_13 = PHI <ivtmp_28(9), 4(2)> _1 = g_270[g_1168_24][0]; g_1168_14 = g_1168_24 + -1; ivtmp_28 = ivtmp_13 - 1; if (ivtmp_28 != 0) goto <bb 9>; [75.00%] else goto <bb 6>; [25.00%] <bb 9> [local count: 88583699]: goto <bb 3>; [100.00%] <bb 6> [local count: 29527901]: # _8 = PHI <_1(3)> where we have a extract-last reduction of a negative step DR. <bb 2> [local count: 29527901]: <bb 3> [local count: 59055800]: # g_1168_24 = PHI <g_1168_14(9), 3(2)> # ivtmp_13 = PHI <ivtmp_28(9), 4(2)> # vectp_g_270.9_2 = PHI <vectp_g_270.9_3(9), &MEM <long unsigned int[5][2]> [(void *)&g_270 + 40B](2)> # ivtmp_29 = PHI <ivtmp_31(9), 0(2)> vect__1.11_4 = MEM <vector(2) long unsigned int> [(long unsigned int *)vectp_g_270.9_2]; vect__1.12_6 = VEC_PERM_EXPR <vect__1.11_4, vect__1.11_4, { 1, 0 }>; vectp_g_270.9_16 = vectp_g_270.9_2 + 18446744073709551600; vect__1.13_7 = MEM <vector(2) long unsigned int> [(long unsigned int *)vectp_g_270.9_16]; vect__1.14_21 = VEC_PERM_EXPR <vect__1.13_7, vect__1.13_7, { 1, 0 }>; vect__1.15_27 = VEC_PERM_EXPR <vect__1.12_6, vect__1.14_21, { 0, 2 }>; _1 = g_270[g_1168_24][0]; g_1168_14 = g_1168_24 + -1; ivtmp_28 = ivtmp_13 - 1; vectp_g_270.9_3 = vectp_g_270.9_16 + 18446744073709551600; ivtmp_31 = ivtmp_29 + 1; if (ivtmp_31 < 2) goto <bb 9>; [50.00%] else goto <bb 6>; [50.00%] <bb 9> [local count: 29527899]: goto <bb 3>; [100.00%] <bb 6> [local count: 29527901]: # vect__1.15_25 = PHI <vect__1.15_27(3)> _22 = BIT_FIELD_REF <vect__1.15_25, 64, 64>; that does not look completely broken, but the initial value of vectp_g_270.9.2 looks suspicious, that's &g_270[2][1], getting us { g_270[2][1], g_270[3][0] } and { g_270[1][1], g_270[2][0] } which we then reverse and remove gaps to get { g_270[3][0], g_270[2][0] } in the first iteration and [1][0] [0][0] in the second which we should then appropriately get the last value of. Now - we eventually fold this up to vect__1.13_7 = MEM <vector(2) long unsigned int> [(long unsigned int *)&g_270 + -8B]; vect__1.14_21 = VEC_PERM_EXPR <vect__1.13_7, vect__1.13_7, { 1, 0 }>; vect__1.15_27 = BIT_INSERT_EXPR <vect__1.13_7, 0, 0 (64 bits)>; _22 = BIT_FIELD_REF <vect__1.15_27, 64, 64>; where we can also see that we access memory before g_270 which might trap. The ability to handle grouped accesses with negative stride is new and unique to SLP IIRC (indeed GCC 14 doesn't support this). In the end FRE5 then bails on the UB: Value numbering stmt = vect__1.13_7 = MEM <vector(2) long unsigned int> [(long unsigned int *)&g_270 + -8B]; Setting value number of vect__1.13_7 to { 0, 0 } (changed) likely interpreting the negative offset as a large very positive one. So what we're missing is that we need to apply peeling for gaps here, but the most conservative fix might be to disallow any gaps with a negative step. I'll try to fix up the missing args to dr_misalignment.