https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122827

--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-16 branch has been updated by Kyrylo Tkachov
<[email protected]>:

https://gcc.gnu.org/g:11a33695161eca150500fe6467c62e4615244816

commit r16-9047-g11a33695161eca150500fe6467c62e4615244816
Author: Kyrylo Tkachov <[email protected]>
Date:   Wed May 27 13:47:48 2026 -0700

    PR target/122827: aarch64: Treat SVE modes as full-Z for callee-save
accounting

    PR122827 is a miscompile of 526.blender_r in SPEC2017 with
    -Ofast -flto=auto -mcpu=neoverse-v2 -msve-vector-bits=128 --param
aarch64-autovec-preference=sve-only.
    Bisection landed on g:b191e8bdecf (the IRA cost-model rework for PR117477),
and the
    miscompile disappears with -mearly-ra=none, -fno-caller-saves, or
    -fno-early-ra, all of which steer register allocation away from V8-V15.

    The bug is in shade_material_loop in shadeinput.c, where a broadcast of
    a float scalar to a VNx2SF value is held in V14 across a call:

        mov     z14.s, s15            ; broadcast s15 -> all 4 .s lanes of z14
        ...
        bl      shade_lamp_loop       ; clobbers V14[127:64]
        ...
        fmla    z20.s, p7/m, z14.s, z21.s   ; reads all 4 .s lanes of z14

    Under AAPCS64 only the low 64 bits of V8-V15 (D8-D15) are callee-saved.
    The prologue dutifully saves d14/d15, but the broadcast and FMA both
    touch the full Z register, and the upper half of v14 gets corrupted by
    the call -- producing wrong floating-point results.

    VNx2SF has GET_MODE_SIZE == 8, so the previous part-clobber check
    treated V14 as fully preserved for that mode.  The mode size is
    correct for the *data* (two 32-bit lanes), but the underlying SVE
    instructions are not packed -- the two .s lanes live at byte offsets
    0-3 and 8-11 of the Z register (the .d-strided layout used for partial
    SVE modes), so byte 8 onwards is outside the AAPCS64-preserved range.
    Any SVE mode in V8-V15 across a call is therefore partially clobbered,
    regardless of GET_MODE_SIZE.

    The change in g:b191e8bdecf made the cost of allocating a callee-saved
    register low enough that IRA started picking V14 for these pseudos,
    exposing the latent miscompile.

    Two places need fixing:

      * aarch64_hard_regno_call_part_clobbered uses GET_MODE_SIZE to decide
        whether the callee-preserved low 64 bits are enough for the value.
        For SVE modes, compare against BYTES_PER_SVE_VECTOR instead: any
        SVE mode physically uses the full Z register.

      * early-RA classifies allocno groups into FPR_D/FPR_Q/FPR_Z by mode
        bit size, and partial_fpr_clobbers then maps FPR_D to V8QImode for
        the ABI check.  That synthetic V8QImode hides the fact that the
        pseudo is actually an SVE mode, so V8-V15 are not excluded from the
        candidate set.  Classify any SVE mode as FPR_Z so that
        partial_fpr_clobbers uses VNx16QImode and correctly removes V8-V15.

    A new test is added in gcc.target/aarch64/sve/pr122827.c
    Before the fix the execution test aborts (V14 is allocated to the broadcast
    across the call, and the explicit clobber inside the callee corrupts its
upper
    lanes). after the fix it passes.

    Explicitly clobbering V14 with inline assembly in the test may be a bit
fragile
    as it depends on RA picking V14 for the pre-call broadcast to trigger the
bug
    but I couldn't find a good way of generalising it without perturbing RA in
ways
    that made the bug not trigger.

    With this patch blender passes again and no regressions on
    aarch64-none-linux-gnu bootstrap.

    Signed-off-by: Kyrylo Tkachov <[email protected]>

    gcc/ChangeLog:

            PR target/122827
            * config/aarch64/aarch64.cc
(aarch64_hard_regno_call_part_clobbered):
            For SVE modes use BYTES_PER_SVE_VECTOR for the per-register size
            rather than GET_MODE_SIZE.
            * config/aarch64/aarch64-early-ra.cc
(early_ra::get_allocno_subgroup):
            Classify any SVE mode as FPR_Z.

    gcc/testsuite/ChangeLog:

            PR target/122827
            * gcc.target/aarch64/sve/pr122827.c: New test.

    (cherry picked from commit 5cb0842b8b86f8495a110018fb8af3c4c3316605)

Reply via email to