https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113196
Bug ID: 113196 Summary: [14 Regression] Failure to use ushll{,2} Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org CC: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64*-*-* For this testcase, adapted from the one for PR110625: int test(unsigned array[4][4]); int foo(unsigned short *a, unsigned long n) { unsigned array[4][4]; for (unsigned i = 0; i < 4; i++, a += 4) { array[i][0] = a[0] << 6; array[i][1] = a[1] << 6; array[i][2] = a[2] << 6; array[i][3] = a[3] << 6; } return test(array); } GCC now uses: mov x1, x0 stp x29, x30, [sp, -80]! movi v30.4s, 0 mov x29, sp ldp q0, q29, [x1] add x0, sp, 16 zip1 v1.8h, v0.8h, v30.8h zip1 v31.8h, v29.8h, v30.8h zip2 v0.8h, v0.8h, v30.8h zip2 v29.8h, v29.8h, v30.8h shl v1.4s, v1.4s, 6 shl v31.4s, v31.4s, 6 shl v0.4s, v0.4s, 6 shl v29.4s, v29.4s, 6 stp q1, q0, [sp, 16] stp q31, q29, [sp, 48] bl test(unsigned int (*) [4]) ldp x29, x30, [sp], 80 ret whereas previously it used USHLL{,2}: mov x1, x0 stp x29, x30, [sp, -80]! mov x29, sp ldp q1, q0, [x1] add x0, sp, 16 ushll v3.4s, v1.4h, 6 ushll v2.4s, v0.4h, 6 ushll2 v1.4s, v1.8h, 6 ushll2 v0.4s, v0.8h, 6 stp q3, q1, [sp, 16] stp q2, q0, [sp, 48] bl test(unsigned int (*) [4]) ldp x29, x30, [sp], 80 ret This changed with g:f26f92b534f9, which expanded zero-extensions to ZIPs. The patch included *ADDW patterns for the new representation, but it looks like there are several more that should be included for full coverage. AIUI, the point of lowering to ZIPs during expand was to allow the zero to be hoisted. An alternative might be to lower during split, but forcibly hoist the zero by inserting around the FUNCTION_BEG note. We could then cache the insn that does that for manual CSE. Godbolt link: https://godbolt.org/z/vzfnebMhb