https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671
--- Comment #22 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Created attachment 37722 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37722&action=edit gcc6-pr69671.patch Actually, on a closer look, I believe the only problem are the patterns that use a vector_move_operand "0C" inside of vec_select with only constants as the parallel's operands. Because fwprop is able to propagate constants into instructions (thus undo the CSE effect), but doesn't do anything on these, because it also simplifies them, so instead of the expected say (vec_select:V4QI (const_vector:V16QI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) ])) we get in there simplified: (const_vector:V4QI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ]) So, by adding extra patterns for that simplification fwprop is able to do its job even if CSE did a better job.