https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83203
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org, | |uros at gcc dot gnu.org --- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So, the above mentioned change optimizes during cse1: (insn 8 7 9 2 (set (reg:V2DI 91) (vec_merge:V2DI (vec_duplicate:V2DI (reg/v:DI 88 [ x ])) (reg:V2DI 91) (const_int 1 [0x1]))) "pr83203.c":6 3655 {sse4_1_pinsrq} (expr_list:REG_DEAD (reg/v:DI 88 [ x ]) (nil))) to: (insn 8 7 9 2 (set (reg:V2DI 91) (vec_concat:V2DI (reg/v:DI 88 [ x ]) (const_int 0 [0]))) "pr83203.c":6 3738 {vec_concatv2di} (expr_list:REG_DEAD (reg/v:DI 88 [ x ]) (nil))) as pseudo 91 contains all zeros. Now, because this is generic tuning we force that into stack. Though I must repeat for the nth time that this is very confusing; either for some AMD chips (is it really that bad in contemporary ones) vmovd is way too expensive, but then either vpinsrq is also too expensive (in that case we should be happy we emit what we do now on the trunk; but then <sse2p4_1>_pinsr<ssemodesuffix> should use Yi instead of x or v in alternatives with r input; and similarly use Yi in vec_concatv2di in the vpinsrq and pinsrq alternatives), or vmovd is expensive, but vpinsrq is not, then we just should use vpinsrq for the vec_concatv2di pattern, (i.e. add alternative for =x,r,C which will split into clearing the destination plus vpinsrq). Another thing is that with -O2 -mavx2 -mtune=intel we emit: vmovq %rdi, %xmm0 vmovdqa %xmm0, %xmm0 ret when we could just emit vmovq %rdi, %xmm0 I think. I guess we'd need a pattern for combine that would match what combiner's trying: (set (reg:V4DI 90) (vec_concat:V4DI (vec_concat:V2DI (reg/v:DI 88 [ x ]) (const_int 0 [0])) (const_vector:V2DI [ (const_int 0 [0]) (const_int 0 [0]) ]))) and perhaps simplify that into something different - vec_select from all zeros and vec_duplicate, so that we don't need to list all weird cases? Though perhaps the r254548 change goes here in the wrong direction.