https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109697
Bug ID: 109697 Summary: arm: lack of MVE instruction costing causing worse codegen on a vec_duplicate Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: stammark at gcc dot gnu.org Target Milestone: --- Hi all, In the arm backend, for MVE targets we previously had this bug on the vcmp patterns: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107987 The fix is fine, but it resulted in some failing tests: * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c * gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c * gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c * gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c * gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c * gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c * gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c * gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c * gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c * gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c * gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c * gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c * gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c * gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c * gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c * gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c * gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c * gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c (after Andrea improved these tests in GCC13) The testcases that are failing are the ones that compare against a scalar immediate (e.g. "vcmpeqq (a, 1.1)"), because the compiler prefers to do: ``` vldr.64 d6, .L5 vldr.64 d7, .L5+8 vcmp.f16 eq, q0, q3 ``` When previously we would much more simply: ``` movs r3, #1 vcmp.u16 cs, q0, r3 ``` The underlying reason for this change is a known deficiency of the MVE implementation: the lack of proper instruction costing. The compiler falls back to calculating costs based on the operands and the new vec_duplicate in the patterns (mve_vcmp<mve_cmp_op>q_n_<mode>, etc) gets given a cost of 32 (when instead it should know that the vec duplicate is free and this is all just one instruction...), so the "literal load + vector-vector compare" wins out against the "put the immediate in a GP reg + vector-scalar compare". For now, I plan on simply XFAIL-ing the tests.