Hi Jakub, On Fri, Feb 09, 2018 at 07:43:16AM +0100, Jakub Jelinek wrote: > BSWAP is documented as: > > @item (bswap:@var{m} @var{x}) > Represents the value @var{x} with the order of bytes reversed, carried out > in mode @var{m}, which must be a fixed-point machine mode. > The mode of @var{x} must be @var{m} or @code{VOIDmode}. > > The fixed-point is something used widely in rtl.texi and is very confusing > now that we have FIXED_POINT_TYPE floats, I assume it talks about integral > modes or, because it is also used with vector modes, about INTEGRAL_MODE_P > modes. My understanding is that bswap on a vector integral mode is meant to > be bswap of each element individually.
Right. The documentation could use some improvement ;-) > The rs6000 backend uses bswap not just on scalar integral modes and > vector integral modes, but also on V4SF and V2DF, but ICEs in simplify-rtx.c > where we don't expect bswap to be used on SF/DFmode (vector bswap is handled > as bswap of each element). > > The following patch adjusts the rs6000 backend to use well defined bswaps > on corresponding integral modes instead (and also does what we've done in > i386 backend years ago, avoid subreg on the lhs because it breaks combine > attempts to optimize it). > Bootstrapped/regtested on powerpc64{,le}-linux, on powerpc64-linux including > -m64/-m32, ok for trunk? > > 2018-02-09 Jakub Jelinek <ja...@redhat.com> > > PR target/84226 > * config/rs6000/vsx.md (p9_xxbrq_v16qi): Change input operand > constraint from =wa to wa. Avoid a subreg on the output operand, > instead use a pseudo and subreg it in a move. > (p9_xxbrd_<mode>): Changed to ... > (p9_xxbrd_v2di): ... this insn, without VSX_D iterator. > (p9_xxbrd_v2df): New expander. > (p9_xxbrw_<mode>): Changed to ... > (p9_xxbrw_v4si): ... this insn, without VSX_W iterator. > (p9_xxbrw_v4sf): New expander. > > * gcc.target/powerpc/pr84226.c: New test. > +(define_expand "p9_xxbrd_v2df" > + [(use (match_operand:V2DF 0 "vsx_register_operand" "=wa")) > + (use (match_operand:V2DF 1 "vsx_register_operand" "wa"))] > + "TARGET_P9_VECTOR" > +{ > + rtx op0 = gen_reg_rtx (V2DImode); > + rtx op1 = gen_lowpart (V2DImode, operands[1]); > + emit_insn (gen_p9_xxbrd_v2di (op0, op1)); > + emit_move_insn (operands[0], gen_lowpart (V2DFmode, op0)); > + DONE; > +}) Okay for trunk with or without such an improvement. Thanks! Segher