On Wed, 2020-02-26 at 16:02 -0800, H.J. Lu wrote:
> On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <l...@redhat.com> wrote:
> > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> > > 
> > >    0: c5 f9 6f d1             vmovdqa %xmm1,%xmm2
> > >    4: 62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
> > > 
> > > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > > indicate target preferences of vector move encoding.  For vector register
> > > to vector register move, we can use 512-bit vector move instructions to
> > > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > > if upper 16 vector registers aren't used.  This patch adds a function,
> > > ix86_output_ssemov, to generate vector moves:
> > > 
> > > 1. If zmm registers are used, use EVEX encoding.
> > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > > will be generated.
> > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> > >    a. With AVX512VL, AVX512VL vector moves will be generated.
> > >    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> > >       move will be done with zmm register move.
> > > 
> > > 
> > [ ... ]
> > 
> > > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > > +   TARGET_AVX512VL or it is a register to register move which can
> > > +   be done with zmm register move. */
> > > +
> > > +static const char *
> > > +ix86_get_ssemov (rtx *operands, unsigned size,
> > > +              enum attr_mode insn_mode, machine_mode mode)
> > > +{
> > > +  char buf[128];
> > > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > > +                    || misaligned_operand (operands[1], mode));
> > > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > > +                  || EXT_REX_SSE_REG_P (operands[1]));
> > > +  machine_mode scalar_mode;
> > > +
> > > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > > +    {
> > > +      switch (scalar_mode)
> > > +     {
> > > +     case E_QImode:
> > > +       if (size == 64)
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu8"
> > > +                      : "vmovdqu64")
> > > +                   : "vmovdqa64");
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = (misaligned_p
> > > +                       ? (TARGET_AVX512BW
> > > +                          ? "vmovdqu8"
> > > +                          : "vmovdqu64")
> > > +                       : "vmovdqa64");
> > > +         }
> > > +       else
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu8"
> > > +                      : "%vmovdqu")
> > > +                   : "%vmovdqa");
> > > +       break;
> > > +     case E_HImode:
> > > +       if (size == 64)
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu16"
> > > +                      : "vmovdqu64")
> > > +                   : "vmovdqa64");
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = (misaligned_p
> > > +                       ? (TARGET_AVX512BW
> > > +                          ? "vmovdqu16"
> > > +                          : "vmovdqu64")
> > > +                       : "vmovdqa64");
> > > +         }
> > > +       else
> > > +         opcode = (misaligned_p
> > > +                   ? (TARGET_AVX512BW
> > > +                      ? "vmovdqu16"
> > > +                      : "%vmovdqu")
> > > +                   : "%vmovdqa");
> > > +       break;
> > > +     case E_SImode:
> > > +       if (size == 64)
> > > +         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > > +         }
> > > +       else
> > > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > +       break;
> > > +     case E_DImode:
> > > +     case E_TImode:
> > > +     case E_OImode:
> > > +       if (size == 64)
> > > +         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +       else if (evex_reg_p)
> > > +         {
> > > +           if (TARGET_AVX512VL)
> > > +             opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +         }
> > > +       else
> > > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > > +       break;
> > > +     case E_XImode:
> > > +       opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > > +       break;
> > > +     default:
> > > +       gcc_unreachable ();
> > > +     }
> > > +    }
> > > +  else
> > > +    gcc_unreachable ();
> > > +
> > > +  if (!opcode)
> > > +    {
> > > +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> > > +      registers without AVX512VL by using zmm register move.  */
> > So the overall flow control in here is rather convoluted.  I hate the way
> > you
> > don't set OPCODE above and then do it down here.  I would suggest breaking
> > the !opcode bits into its own little function.  Then above in those places
> > where you do
> > 
> > if (TARGET_AVX512VL)
> >    opcode = <whatever>;
> > 
> > 
> > Instead change those to something like
> > 
> > if (TARGET_AVX512VL)
> >    opcode = <whatever>;
> > else
> >    opcode = new_function (...)
> > 
> > That way opcode is set on every path through the major if-else in this
> > function.
> > 
> > Second when I suggested you break the patch up on a per-pattern basis, I
> > probably should have also said that I would start with the minimal support
> > in
> > ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
> > converted.  That way the mapping from current code to new code is more
> > obvious.
> 
> I will do these.   On x86,  different instructions can move vector
> registers.  They all
> do the same thing.  But some are preferred over others, depending on
> tuning options.
I know.

> 
> > As it stands the breaking into separate patches didn't really help much
> > because
> > we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov
> > in
> > patch #1 and that's the code I'm most worried about verifying we get right,
> > particularly at this stage.  I literally can't take any patch and map from
> > the
> > old code to the new code without having to understand all of patch #1.
> 
> The old code is very convoluted and wrong in some cases.  I am trying to
> clean it up.  I will update my patches based on your feedback.
Thanks.  I was going to try and break those two functions down on my own, but
you're more likely to get it right than I am :-)

jeff
> 

Reply via email to