On Mon, Feb 11, 2019 at 02:15:18PM +0100, Uros Bizjak wrote:
> > Here is the updated patch to remove ext_sse_reg_operand check with a
> > testcase.
> >
> > OK for trunk?
>
> No. As said, please correctly set mode to XImode in mode attribute
> calculation.
The instructions in question are
vmovdqu32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqu32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem
vmovdqa32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem
vmovdqa32 %{x,y}mm{[0-9],[12][0-9],3[01]}, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, %{x,y}mm{[0-9],[12][0-9],3[01]}
Why should those instructions be XImode? They have 16 or 32 byte operands,
never 64 byte.
Using EXT_REX_SSE_REG_P is what *movdi_internal uses too:
case MODE_TI:
/* Handle AVX512 registers set. */
if (EXT_REX_SSE_REG_P (operands[0])
|| EXT_REX_SSE_REG_P (operands[1]))
return "vmovdqa64\t{%1, %0|%0, %1}";
return "%vmovdqa\t{%1, %0|%0, %1}";
Sure, in that case it is not MODE_DI, but MODE_TI, because the move is
actually 128-bit, not 64-bit, but we do not claim it is 512-bit.
*movsi_internal is incorrect (and inefficient):
case MODE_TI:
return "%vmovdqa\t{%1, %0|%0, %1}";
case MODE_XI:
return "vmovdqa32\t{%g1, %g0|%g0, %g1}";
...
(eq_attr "alternative" "8,9")
(cond [(ior (match_operand 0 "ext_sse_reg_operand")
(match_operand 1 "ext_sse_reg_operand"))
(const_string "XI")
(ior (not (match_test "TARGET_SSE2"))
(match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
(const_string "V4SF")
(match_test "TARGET_AVX")
(const_string "TI")
(match_test "optimize_function_for_size_p (cfun)")
(const_string "V4SF")
]
(const_string "TI"))
In my reading, for (set (reg:SI xmm16) (reg:SI xmm17)) the above will
emit vmovdqa32 %zmm17, %zmm16 even for -mavx512vl, which looks wrong.
So, I'd suggest (and (not (match_test "TARGET_AVX512VL"))
(ior (match_operand 0 "ext_sse_reg_operand")
(match_operand 1 "ext_sse_reg_operand"))
(const_string "XI")
I see other wierdo stuff e.g. in *movdf_internal:
/* movaps is one byte shorter for non-AVX targets. */
(eq_attr "alternative" "13,17")
(cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
(not (match_test "TARGET_AVX512VL")))
(ior (match_operand 0 "ext_sse_reg_operand")
(match_operand 1 "ext_sse_reg_operand")))
(const_string "V8DF")
If !TARGET_AVX512VL and one or both of the operands is (are)
ext_sse_reg_operand, obviously MODE_V8DF needs to be used. But doesn't the
above force use of vmovapd %zmm16, %zmm17 even if just -mavx512vl
-mprefer-vector-width=512? I don't see any reason not to use
vmovapd %xmm16, %xmm17 in that case. -mprefer-vector-width=512 is not you
must use ZMM all the time, but it is fine to use even EVEX instructions with
512-bit width. Ditto *movsf_internal.
Jakub