https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|rtl-optimization |target --- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> --- We produce: Trying 5, 7 -> 11: 5: r86:V4SF=[`*.LC0'] REG_EQUAL const_vector 7: r85:V4SF=vec_select(vec_concat(r86:V4SF,r86:V4SF),parallel) REG_DEAD r86:V4SF REG_EQUAL const_vector 11: r88:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel) REG_DEAD r85:V4SF REG_EQUAL const_vector Failed to match this instruction: (set (reg:V4SF 88) (const_vector:V4SF [ (const_double:SF 2.0e+0 [0x0.8p+2]) (const_double:SF 1.0e+0 [0x0.8p+1]) (const_double:SF 4.0e+0 [0x0.8p+3]) (const_double:SF 3.0e+0 [0x0.cp+2]) ])) Which means the vec_select are merging at the rtl level just fine. Anyways if the target expands __builtin_ia32_shufps to VEC_PERM_EXPR we would have gotten this optimized at the gimple level. So this is a target issue.