On Tue, 5 Mar 2024, Jakub Jelinek wrote: > Hi! > > On the following testcase, we have > (insn 10 7 11 2 (set (reg/v:TI 106 [ h ]) > (rotate:TI (reg/v:TI 106 [ h ]) > (const_int 64 [0x40]))) "pr114211.c":8:5 1042 > {rotl64ti2_doubleword} > (nil)) > before subreg1 and the pass decides to use > (reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ]) > register pair instead of (reg/v:TI 106 [ h ]). > resolve_operand_for_swap_move_operator implements it by pretending it is > an assignment from > (concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ])) > to > (concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ])) > The problem is that if the rotate argument is the same as destination or > if there is even an overlap between the first half of the destination with > second half of the source we emit incorrect code, because the store to > (reg:DI 128 [ h+8 ]) overwrites what we need for source of the second > move. THe following patch detects that case and uses a temporary pseudo > to hold the original (reg:DI 128 [ h+8 ]) value across the first store. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK. I wonder if we need to care about extra temporaries on RTL before RA, thus whether always using a temporary would be OK? Thanks, Richard. > 2024-03-05 Jakub Jelinek <ja...@redhat.com> > > PR rtl-optimization/114211 > * lower-subreg.cc (resolve_simple_move): For double-word > rotates by BITS_PER_WORD if there is overlap between source > and destination use a temporary. > > * gcc.dg/pr114211.c: New test. > > --- gcc/lower-subreg.cc.jj 2024-01-03 11:51:33.713700906 +0100 > +++ gcc/lower-subreg.cc 2024-03-04 20:29:13.911428988 +0100 > @@ -927,6 +927,21 @@ resolve_simple_move (rtx set, rtx_insn * > SRC's operator. */ > dest = resolve_operand_for_swap_move_operator (dest); > src = src_op; > + if (resolve_reg_p (src)) > + { > + gcc_assert (GET_CODE (src) == CONCATN); > + if (reg_overlap_mentioned_p (XVECEXP (dest, 0, 0), > + XVECEXP (src, 0, 1))) > + { > + /* If there is overlap betwee the first half of the > + destination and what will be stored to the second one, > + use a temporary pseudo. See PR114211. */ > + rtx tem = gen_reg_rtx (GET_MODE (XVECEXP (src, 0, 1))); > + emit_move_insn (tem, XVECEXP (src, 0, 1)); > + src = copy_rtx (src); > + XVECEXP (src, 0, 1) = tem; > + } > + } > } > else if (resolve_reg_p (src_op)) > { > --- gcc/testsuite/gcc.dg/pr114211.c.jj 2024-03-04 20:37:58.735339443 > +0100 > +++ gcc/testsuite/gcc.dg/pr114211.c 2024-03-04 20:37:33.666678077 +0100 > @@ -0,0 +1,23 @@ > +/* PR rtl-optimization/114211 */ > +/* { dg-do run { target int128 } } */ > +/* { dg-options "-O -fno-tree-coalesce-vars -Wno-psabi" } */ > + > +typedef unsigned __int128 V __attribute__((__vector_size__ (16))); > +unsigned int u; > +V v; > + > +V > +foo (unsigned __int128 h) > +{ > + h = h << 64 | h >> 64; > + h *= ~u; > + return h + v; > +} > + > +int > +main () > +{ > + V x = foo (1); > + if (x[0] != (unsigned __int128) 0xffffffff << 64) > + __builtin_abort (); > +} > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)