https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214
--- Comment #19 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org>
---
OK. The .optimized dumps seem to be the same for both -mavx2 and
-march=skylake-avx512. Things only diverge during expand.
It looks like it might be a bug in:
(define_insn "<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>"
[(set (match_operand:VI8F_256 0 "register_operand" "=v")
(vec_select:VI8F_256
(vec_concat:<ssedoublemode>
(match_operand:VI8F_256 1 "register_operand" "v")
(match_operand:VI8F_256 2 "nonimmediate_operand" "vm"))
(parallel [(match_operand 3 "const_0_to_3_operand")
(match_operand 4 "const_0_to_3_operand")
(match_operand 5 "const_4_to_7_operand")
(match_operand 6 "const_4_to_7_operand")])))]
"TARGET_AVX512VL
&& (INTVAL (operands[3]) == (INTVAL (operands[4]) - 1)
&& INTVAL (operands[5]) == (INTVAL (operands[6]) - 1))"
{
int mask;
mask = INTVAL (operands[3]) / 2;
mask |= (INTVAL (operands[5]) - 4) / 2 << 1;
operands[3] = GEN_INT (mask);
return "vshuf<shuffletype>64x2\t{%3, %2, %1,
%0<mask_operand7>|%0<mask_operand7>, %1, %2, %3}";
}
[(set_attr "type" "sselog")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
which AFAICT requires without checking that operands 3 and 5 are even (0 or 2
and 4 or 6 respectively). In this case we're using it to match:
(insn 40 39 41 6 (set (reg:V4DI 101 [ vect__5.17 ])
(vec_select:V4DI (vec_concat:V8DI (reg:V4DI 98 [ vect__5.14 ])
(reg:V4DI 140 [ vect__5.15 ]))
(parallel [
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 5 [0x5])
(const_int 6 [0x6])
]))) "/tmp/foo.c":8:22 4069 {*avx512dq_shuf_i64x2_1}
(nil))
and treat the permute mask as {2, 3, 4, 5} instead.