https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122827
--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The releases/gcc-15 branch has been updated by Kyrylo Tkachov <[email protected]>: https://gcc.gnu.org/g:15bf9d5228b4de83fd5505124ab0f19f831ae725 commit r15-11238-g15bf9d5228b4de83fd5505124ab0f19f831ae725 Author: Kyrylo Tkachov <[email protected]> Date: Wed May 27 13:47:48 2026 -0700 PR target/122827: aarch64: Treat SVE modes as full-Z for callee-save accounting PR122827 is a miscompile of 526.blender_r in SPEC2017 with -Ofast -flto=auto -mcpu=neoverse-v2 -msve-vector-bits=128 --param aarch64-autovec-preference=sve-only. Bisection landed on g:b191e8bdecf (the IRA cost-model rework for PR117477), and the miscompile disappears with -mearly-ra=none, -fno-caller-saves, or -fno-early-ra, all of which steer register allocation away from V8-V15. The bug is in shade_material_loop in shadeinput.c, where a broadcast of a float scalar to a VNx2SF value is held in V14 across a call: mov z14.s, s15 ; broadcast s15 -> all 4 .s lanes of z14 ... bl shade_lamp_loop ; clobbers V14[127:64] ... fmla z20.s, p7/m, z14.s, z21.s ; reads all 4 .s lanes of z14 Under AAPCS64 only the low 64 bits of V8-V15 (D8-D15) are callee-saved. The prologue dutifully saves d14/d15, but the broadcast and FMA both touch the full Z register, and the upper half of v14 gets corrupted by the call -- producing wrong floating-point results. VNx2SF has GET_MODE_SIZE == 8, so the previous part-clobber check treated V14 as fully preserved for that mode. The mode size is correct for the *data* (two 32-bit lanes), but the underlying SVE instructions are not packed -- the two .s lanes live at byte offsets 0-3 and 8-11 of the Z register (the .d-strided layout used for partial SVE modes), so byte 8 onwards is outside the AAPCS64-preserved range. Any SVE mode in V8-V15 across a call is therefore partially clobbered, regardless of GET_MODE_SIZE. The change in g:b191e8bdecf made the cost of allocating a callee-saved register low enough that IRA started picking V14 for these pseudos, exposing the latent miscompile. Two places need fixing: * aarch64_hard_regno_call_part_clobbered uses GET_MODE_SIZE to decide whether the callee-preserved low 64 bits are enough for the value. For SVE modes, compare against BYTES_PER_SVE_VECTOR instead: any SVE mode physically uses the full Z register. * early-RA classifies allocno groups into FPR_D/FPR_Q/FPR_Z by mode bit size, and partial_fpr_clobbers then maps FPR_D to V8QImode for the ABI check. That synthetic V8QImode hides the fact that the pseudo is actually an SVE mode, so V8-V15 are not excluded from the candidate set. Classify any SVE mode as FPR_Z so that partial_fpr_clobbers uses VNx16QImode and correctly removes V8-V15. A new test is added in gcc.target/aarch64/sve/pr122827.c Before the fix the execution test aborts (V14 is allocated to the broadcast across the call, and the explicit clobber inside the callee corrupts its upper lanes). after the fix it passes. Explicitly clobbering V14 with inline assembly in the test may be a bit fragile as it depends on RA picking V14 for the pre-call broadcast to trigger the bug but I couldn't find a good way of generalising it without perturbing RA in ways that made the bug not trigger. With this patch blender passes again and no regressions on aarch64-none-linux-gnu bootstrap. Signed-off-by: Kyrylo Tkachov <[email protected]> gcc/ChangeLog: PR target/122827 * config/aarch64/aarch64.cc (aarch64_hard_regno_call_part_clobbered): For SVE modes use BYTES_PER_SVE_VECTOR for the per-register size rather than GET_MODE_SIZE. * config/aarch64/aarch64-early-ra.cc (early_ra::get_allocno_subgroup): Classify any SVE mode as FPR_Z. gcc/testsuite/ChangeLog: PR target/122827 * gcc.target/aarch64/sve/pr122827.c: New test. (cherry picked from commit 5cb0842b8b86f8495a110018fb8af3c4c3316605)
