On Wed, 2020-02-26 at 16:02 -0800, H.J. Lu wrote: > On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <l...@redhat.com> wrote: > > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote: > > > On x86, when AVX and AVX512 are enabled, vector move instructions can > > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512): > > > > > > 0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 > > > 4: 62 f1 fd 08 6f d1 vmovdqa64 %xmm1,%xmm2 > > > > > > We prefer VEX encoding over EVEX since VEX is shorter. Also AVX512F > > > only supports 512-bit vector moves. AVX512F + AVX512VL supports 128-bit > > > and 256-bit vector moves. Mode attributes on x86 vector move patterns > > > indicate target preferences of vector move encoding. For vector register > > > to vector register move, we can use 512-bit vector move instructions to > > > move 128-bit/256-bit vector if AVX512VL isn't available. With AVX512F > > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves > > > if upper 16 vector registers aren't used. This patch adds a function, > > > ix86_output_ssemov, to generate vector moves: > > > > > > 1. If zmm registers are used, use EVEX encoding. > > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding > > > will be generated. > > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used: > > > a. With AVX512VL, AVX512VL vector moves will be generated. > > > b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register > > > move will be done with zmm register move. > > > > > > > > [ ... ] > > > > > +/* Return the opcode of the TYPE_SSEMOV instruction. To move from > > > + or to xmm16-xmm31/ymm16-ymm31 registers, we either require > > > + TARGET_AVX512VL or it is a register to register move which can > > > + be done with zmm register move. */ > > > + > > > +static const char * > > > +ix86_get_ssemov (rtx *operands, unsigned size, > > > + enum attr_mode insn_mode, machine_mode mode) > > > +{ > > > + char buf[128]; > > > + bool misaligned_p = (misaligned_operand (operands[0], mode) > > > + || misaligned_operand (operands[1], mode)); > > > + bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0]) > > > + || EXT_REX_SSE_REG_P (operands[1])); > > > + machine_mode scalar_mode; > > > + > > > + else if (SCALAR_INT_MODE_P (scalar_mode)) > > > + { > > > + switch (scalar_mode) > > > + { > > > + case E_QImode: > > > + if (size == 64) > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW > > > + ? "vmovdqu8" > > > + : "vmovdqu64") > > > + : "vmovdqa64"); > > > + else if (evex_reg_p) > > > + { > > > + if (TARGET_AVX512VL) > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW > > > + ? "vmovdqu8" > > > + : "vmovdqu64") > > > + : "vmovdqa64"); > > > + } > > > + else > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW > > > + ? "vmovdqu8" > > > + : "%vmovdqu") > > > + : "%vmovdqa"); > > > + break; > > > + case E_HImode: > > > + if (size == 64) > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW > > > + ? "vmovdqu16" > > > + : "vmovdqu64") > > > + : "vmovdqa64"); > > > + else if (evex_reg_p) > > > + { > > > + if (TARGET_AVX512VL) > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW > > > + ? "vmovdqu16" > > > + : "vmovdqu64") > > > + : "vmovdqa64"); > > > + } > > > + else > > > + opcode = (misaligned_p > > > + ? (TARGET_AVX512BW > > > + ? "vmovdqu16" > > > + : "%vmovdqu") > > > + : "%vmovdqa"); > > > + break; > > > + case E_SImode: > > > + if (size == 64) > > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; > > > + else if (evex_reg_p) > > > + { > > > + if (TARGET_AVX512VL) > > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; > > > + } > > > + else > > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; > > > + break; > > > + case E_DImode: > > > + case E_TImode: > > > + case E_OImode: > > > + if (size == 64) > > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; > > > + else if (evex_reg_p) > > > + { > > > + if (TARGET_AVX512VL) > > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; > > > + } > > > + else > > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; > > > + break; > > > + case E_XImode: > > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; > > > + break; > > > + default: > > > + gcc_unreachable (); > > > + } > > > + } > > > + else > > > + gcc_unreachable (); > > > + > > > + if (!opcode) > > > + { > > > + /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31 > > > + registers without AVX512VL by using zmm register move. */ > > So the overall flow control in here is rather convoluted. I hate the way > > you > > don't set OPCODE above and then do it down here. I would suggest breaking > > the !opcode bits into its own little function. Then above in those places > > where you do > > > > if (TARGET_AVX512VL) > > opcode = <whatever>; > > > > > > Instead change those to something like > > > > if (TARGET_AVX512VL) > > opcode = <whatever>; > > else > > opcode = new_function (...) > > > > That way opcode is set on every path through the major if-else in this > > function. > > > > Second when I suggested you break the patch up on a per-pattern basis, I > > probably should have also said that I would start with the minimal support > > in > > ix86_get_ssemov and ix86_output_ssemov to support the pattern you just > > converted. That way the mapping from current code to new code is more > > obvious. > > I will do these. On x86, different instructions can move vector > registers. They all > do the same thing. But some are preferred over others, depending on > tuning options. I know.
> > > As it stands the breaking into separate patches didn't really help much > > because > > we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov > > in > > patch #1 and that's the code I'm most worried about verifying we get right, > > particularly at this stage. I literally can't take any patch and map from > > the > > old code to the new code without having to understand all of patch #1. > > The old code is very convoluted and wrong in some cases. I am trying to > clean it up. I will update my patches based on your feedback. Thanks. I was going to try and break those two functions down on my own, but you're more likely to get it right than I am :-) jeff >