https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Looking at the RTL dumps before the patch in cse1 we had:
(insn 27 26 28 2 (set (reg:V16QI 138)
        (const_vector:V16QI [
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
            ])) {*movv16qi_internal}
     (nil))

(insn 30 29 31 2 (set (reg:V16QI 136)
        (vec_concat:V16QI (vec_merge:V4QI (truncate:V4QI (reg:V4SI 137 [ x.4_10
]))
                (vec_select:V4QI (const_vector:V16QI [
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                        ])
                    (parallel [
                            (const_int 0 [0])
                            (const_int 1 [0x1])
                            (const_int 2 [0x2])
                            (const_int 3 [0x3])
                        ]))
                (subreg:QI (reg:SI 139 [ m.5_11 ]) 0))
            (const_vector:V12QI [
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                ]))) {avx512vl_truncatev4siv4qi2_mask}
     (expr_list:REG_DEAD (reg:SI 140 [ m.5_11 ])
        (expr_list:REG_DEAD (reg:V16QI 138)
            (expr_list:REG_DEAD (reg:V4SI 137 [ x.4_10 ])
                (nil)))))

but after the patch the const_vector is CSE'd into:
(insn 27 26 28 2 (set (reg:V16QI 138)
        (const_vector:V16QI [
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
            ]))
/work/kyrtka01/local-checkouts/build-x86/lib/gcc/x86_64-pc-linux-gnu/6.0.0/include/avx512vlintrin.h:1491
1214 {*movv16qi_internal}
     (nil))
(insn 28 27 29 2 (set (reg:SI 139 [ m.5_11 ])
        (zero_extend:SI (reg:QI 94 [ m.5_11 ]))) avx.c:21 136
{*zero_extendqisi2}
     (expr_list:REG_DEAD (reg:QI 94 [ m.5_11 ])
        (nil)))
(insn 29 28 30 2 (set (reg:SI 140 [ m.5_11 ])
        (reg:SI 139 [ m.5_11 ]))
/work/kyrtka01/local-checkouts/build-x86/lib/gcc/x86_64-pc-linux-gnu/6.0.0/include/avx512vlintrin.h:1491
86 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 139 [ m.5_11 ])
        (nil)))
(insn 30 29 31 2 (set (reg:V16QI 136)
        (vec_concat:V16QI (vec_merge:V4QI (truncate:V4QI (reg:V4SI 137 [ x.4_10
]))
                (vec_select:V4QI (reg:V16QI 138)
                    (parallel [
                            (const_int 0 [0])
                            (const_int 1 [0x1])
                            (const_int 2 [0x2])
                            (const_int 3 [0x3])
                        ]))
                (subreg:QI (reg:SI 139 [ m.5_11 ]) 0))
            (const_vector:V12QI [
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                    (const_int 0 [0])
                ]))) {avx512vl_truncatev4siv4qi2_mask}
     (expr_list:REG_DEAD (reg:SI 140 [ m.5_11 ])
        (expr_list:REG_DEAD (reg:V16QI 138)
            (expr_list:REG_DEAD (reg:V4SI 137 [ x.4_10 ])
                (nil)))))

which is a legitimate CSE move as the result is simpler from an RTL structure
point of view.

However, the first form matches down to a special AVX512 broadcase instruction
from what I can tell.

I'm not familiar with the AVX512 patterns, but is there a particular reason why
the original pattern contains:
                (vec_select:V4QI (const_vector:V16QI [
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                            (const_int 0 [0])
                        ])
                    (parallel [
                            (const_int 0 [0])
                            (const_int 1 [0x1])
                            (const_int 2 [0x2])
                            (const_int 3 [0x3])
                        ]))
?

Surely that's an always constant expression that should rather be written as:
(const_vector:V4QI [(const_int 0) (const_int 0) (const_int 0) (const_int 0)])
?

Reply via email to