https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104612

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:7e691189ca9c04fdba71ceada1faba62afbc1463

commit r12-7323-g7e691189ca9c04fdba71ceada1faba62afbc1463
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Tue Feb 22 10:38:37 2022 +0100

    i386: Fix up copysign/xorsign expansion [PR104612]

    We ICE on the following testcase for -m32 since r12-3435. because
    operands[2] is (subreg:SF (reg:DI ...) 0) and
    lowpart_subreg (V4SFmode, operands[2], SFmode)
    returns NULL, and that is what we use in AND etc. insns we emit.

    My earlier version of the patch fixes that by calling force_reg for the
    input operands, to make sure they are really REGs and so lowpart_subreg
    will succeed on them - even for theoretical MEMs using REGs there seems
    desirable, we don't want to read following memory slots for the paradoxical
    subreg.  For the outputs, I thought we'd get better code by always
computing
    result into a new pseudo and them move lowpart of that pseudo into dest.

    Unfortunately it regressed
    FAIL: gcc.target/i386/pr89984-2.c scan-assembler-not vmovaps
    on which the patch changes:
            vandps  .LC0(%rip), %xmm1, %xmm1
    -       vxorps  %xmm0, %xmm1, %xmm0
    +       vxorps  %xmm0, %xmm1, %xmm1
    +       vmovaps %xmm1, %xmm0
            ret
    The RA sees:
    (insn 8 4 9 2 (set (reg:V4SF 85)
            (and:V4SF (subreg:V4SF (reg:SF 90) 0)
                (mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S16
A128]))) "pr89984-2.c":7:12 2838 {*andv4sf3}
         (expr_list:REG_DEAD (reg:SF 90)
            (nil)))
    (insn 9 8 10 2 (set (reg:V4SF 87)
            (xor:V4SF (reg:V4SF 85)
                (subreg:V4SF (reg:SF 89) 0))) "pr89984-2.c":7:12 2842
{*xorv4sf3}
         (expr_list:REG_DEAD (reg:SF 89)
            (expr_list:REG_DEAD (reg:V4SF 85)
                (nil))))
    (insn 10 9 14 2 (set (reg:SF 82 [ <retval> ])
            (subreg:SF (reg:V4SF 87) 0)) "pr89984-2.c":7:12 142
{*movsf_internal}
         (expr_list:REG_DEAD (reg:V4SF 87)
            (nil)))
    (insn 14 10 15 2 (set (reg/i:SF 20 xmm0)
            (reg:SF 82 [ <retval> ])) "pr89984-2.c":8:1 142 {*movsf_internal}
         (expr_list:REG_DEAD (reg:SF 82 [ <retval> ])
            (nil)))
    (insn 15 14 0 2 (use (reg/i:SF 20 xmm0)) "pr89984-2.c":8:1 -1
         (nil))
    and doesn't know that if it would use xmm0 not just for pseudo 82
    but also for pseudo 87, it could create a noop move in insn 10 and
    so could avoid an extra register copy and nothing later on is able
    to figure that out either.  I don't know how the RA should know
    that though.

    So that we don't regress, this version of the patch
    will do this stuff (i.e. use fresh vector pseudo as destination and
    then move lowpart of that to dest) over what it used before (i.e.
    use paradoxical subreg of the dest) only if lowpart_subreg returns NULL.

    2022-02-22  Jakub Jelinek  <ja...@redhat.com>

            PR target/104612
            * config/i386/i386-expand.cc (ix86_expand_copysign): Call force_reg
            on input operands before calling lowpart_subreg on it.  For output
            operand, use a vmode pseudo as destination and then move its
lowpart
            subreg into operands[0] if lowpart_subreg fails on dest.
            (ix86_expand_xorsign): Likewise.

            * gcc.dg/pr104612.c: New test.

Reply via email to