https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #23 from Tianyang Chou <tianyang.chou at gmail dot com> --- (In reply to Tianyang Chou from comment #22) > (In reply to Tamar Christina from comment #21) > > Thus finally fixed > > Hi Tamar, > > Is there any other prerequisite patches for this patch: "perform affine fold > to unsigned on non address expressions." ? > > I apply your patch to gcc-13.2.0 official and got 10% performance increase > on aarch64, but applying the patch to gcc-12.3.0 official got 0 performance > boost. > > So I wonder if there are some related patches between gcc-12.3.0 and > gcc-13.2.0 which bring the difference. I saw GCC-12.3.0 official with the patch generate asm for the test case like below: ldp w7, w6, [x0, 216] ldp w5, w4, [x0, 252] ldr w3, [x0, 288] ldr w2, [x0, 292] sub w7, w7, #1 sub w6, w6, #1 sub w5, w5, #1 sub w4, w4, #1 sub w3, w3, #1 stp w7, w6, [x0, 216] stp w5, w4, [x0, 252] add x0, x0, 324 sub w2, w2, #1 str w3, [x0, -36] str w2, [x0, -32] and gcc-13.2.0 offcial with the patch generate asm like: mvni v3.2s, 0 ldr d2, [x0, 216] ldr d1, [x0, 252] ldr d0, [x0, 288] add v2,2s, v2,2s, v3,2s add x0,x0, 324 add v1,2s, v1,2s, v3.2s add v0,2s, v0.2s, v3.2s str d2, [x0, -108] str d1, [x0, -72] str d0, [x0, -36] It seems code didn't get vectorized when using GCC-12.3.0