https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48609
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Hongtao.liu from comment #6) > (In reply to Hongtao.liu from comment #5) > > (In reply to Andrew Pinski from comment #2) > > > Confirmed, In this case, it is a middle-end issue, I suspect if we used > > > V2SFmode for the incoming argument, it might work better. Right now we > > Yes, under TAREGT_SSE2 and TARGET_64BIT, we support movv2sf, i think it's > > reasonable to use V2SFmode instead of DImode as incoming argument mode for > > SCmode. > > Doesn't help here > > foo: > .LFB0: > .cfi_startproc > movlps %xmm0, -8(%rsp) # 3 [c=4 l=5] *movv2sf_internal/14 > movss -8(%rsp), %xmm0 # 16 [c=8 l=6] *movsf_internal/7 > movss %xmm0, bar(%rip) # 11 [c=4 l=8] *movsf_internal/8 > movss -4(%rsp), %xmm0 # 17 [c=8 l=6] *movsf_internal/7 > movss %xmm0, bar+4(%rip) # 12 [c=4 l=8] *movsf_internal/8 > ret # 21 [c=0 l=1] simple_return_internal You have to do a little bit more. Like change how the extraction for the two parts for the concat.