https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671
--- Comment #2 from ktkachov at gcc dot gnu.org ---
Looking at the RTL dumps before the patch in cse1 we had:
(insn 27 26 28 2 (set (reg:V16QI 138)
(const_vector:V16QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])) {*movv16qi_internal}
(nil))
(insn 30 29 31 2 (set (reg:V16QI 136)
(vec_concat:V16QI (vec_merge:V4QI (truncate:V4QI (reg:V4SI 137 [ x.4_10
]))
(vec_select:V4QI (const_vector:V16QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
]))
(subreg:QI (reg:SI 139 [ m.5_11 ]) 0))
(const_vector:V12QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
]))) {avx512vl_truncatev4siv4qi2_mask}
(expr_list:REG_DEAD (reg:SI 140 [ m.5_11 ])
(expr_list:REG_DEAD (reg:V16QI 138)
(expr_list:REG_DEAD (reg:V4SI 137 [ x.4_10 ])
(nil)))))
but after the patch the const_vector is CSE'd into:
(insn 27 26 28 2 (set (reg:V16QI 138)
(const_vector:V16QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
]))
/work/kyrtka01/local-checkouts/build-x86/lib/gcc/x86_64-pc-linux-gnu/6.0.0/include/avx512vlintrin.h:1491
1214 {*movv16qi_internal}
(nil))
(insn 28 27 29 2 (set (reg:SI 139 [ m.5_11 ])
(zero_extend:SI (reg:QI 94 [ m.5_11 ]))) avx.c:21 136
{*zero_extendqisi2}
(expr_list:REG_DEAD (reg:QI 94 [ m.5_11 ])
(nil)))
(insn 29 28 30 2 (set (reg:SI 140 [ m.5_11 ])
(reg:SI 139 [ m.5_11 ]))
/work/kyrtka01/local-checkouts/build-x86/lib/gcc/x86_64-pc-linux-gnu/6.0.0/include/avx512vlintrin.h:1491
86 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 139 [ m.5_11 ])
(nil)))
(insn 30 29 31 2 (set (reg:V16QI 136)
(vec_concat:V16QI (vec_merge:V4QI (truncate:V4QI (reg:V4SI 137 [ x.4_10
]))
(vec_select:V4QI (reg:V16QI 138)
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
]))
(subreg:QI (reg:SI 139 [ m.5_11 ]) 0))
(const_vector:V12QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
]))) {avx512vl_truncatev4siv4qi2_mask}
(expr_list:REG_DEAD (reg:SI 140 [ m.5_11 ])
(expr_list:REG_DEAD (reg:V16QI 138)
(expr_list:REG_DEAD (reg:V4SI 137 [ x.4_10 ])
(nil)))))
which is a legitimate CSE move as the result is simpler from an RTL structure
point of view.
However, the first form matches down to a special AVX512 broadcase instruction
from what I can tell.
I'm not familiar with the AVX512 patterns, but is there a particular reason why
the original pattern contains:
(vec_select:V4QI (const_vector:V16QI [
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
(const_int 0 [0])
])
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
]))
?
Surely that's an always constant expression that should rather be written as:
(const_vector:V4QI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)])
?