On Tue, 5 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase, we have
> (insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
>         (rotate:TI (reg/v:TI 106 [ h ])
>             (const_int 64 [0x40]))) "pr114211.c":8:5 1042 
> {rotl64ti2_doubleword}
>      (nil))
> before subreg1 and the pass decides to use
> (reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
> register pair instead of (reg/v:TI 106 [ h ]).
> resolve_operand_for_swap_move_operator implements it by pretending it is
> an assignment from
> (concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
> to
> (concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
> The problem is that if the rotate argument is the same as destination or
> if there is even an overlap between the first half of the destination with
> second half of the source we emit incorrect code, because the store to
> (reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
> move.  THe following patch detects that case and uses a temporary pseudo
> to hold the original (reg:DI 128 [ h+8 ]) value across the first store.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

I wonder if we need to care about extra temporaries on RTL before
RA, thus whether always using a temporary would be OK?

Thanks,
Richard.

> 2024-03-05  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR rtl-optimization/114211
>       * lower-subreg.cc (resolve_simple_move): For double-word
>       rotates by BITS_PER_WORD if there is overlap between source
>       and destination use a temporary.
> 
>       * gcc.dg/pr114211.c: New test.
> 
> --- gcc/lower-subreg.cc.jj    2024-01-03 11:51:33.713700906 +0100
> +++ gcc/lower-subreg.cc       2024-03-04 20:29:13.911428988 +0100
> @@ -927,6 +927,21 @@ resolve_simple_move (rtx set, rtx_insn *
>            SRC's operator.  */
>         dest = resolve_operand_for_swap_move_operator (dest);
>         src = src_op;
> +       if (resolve_reg_p (src))
> +         {
> +           gcc_assert (GET_CODE (src) == CONCATN);
> +           if (reg_overlap_mentioned_p (XVECEXP (dest, 0, 0),
> +                                        XVECEXP (src, 0, 1)))
> +             {
> +               /* If there is overlap betwee the first half of the
> +                  destination and what will be stored to the second one,
> +                  use a temporary pseudo.  See PR114211.  */
> +               rtx tem = gen_reg_rtx (GET_MODE (XVECEXP (src, 0, 1)));
> +               emit_move_insn (tem, XVECEXP (src, 0, 1));
> +               src = copy_rtx (src);
> +               XVECEXP (src, 0, 1) = tem;
> +             }
> +         }
>       }
>        else if (resolve_reg_p (src_op))
>       {
> --- gcc/testsuite/gcc.dg/pr114211.c.jj        2024-03-04 20:37:58.735339443 
> +0100
> +++ gcc/testsuite/gcc.dg/pr114211.c   2024-03-04 20:37:33.666678077 +0100
> @@ -0,0 +1,23 @@
> +/* PR rtl-optimization/114211 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O -fno-tree-coalesce-vars -Wno-psabi" } */
> +
> +typedef unsigned __int128 V __attribute__((__vector_size__ (16)));
> +unsigned int u;
> +V v;
> +
> +V
> +foo (unsigned __int128 h)
> +{
> +  h = h << 64 | h >> 64;
> +  h *= ~u;
> +  return h + v;
> +}
> +
> +int
> +main ()
> +{
> +  V x = foo (1);
> +  if (x[0] != (unsigned __int128) 0xffffffff << 64)
> +    __builtin_abort ();
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to