https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88189
Bug ID: 88189 Summary: ix86_expand_sse_movcc and blend for scalars Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* double f(double a,double b){ return (a<0)?a:b; } typedef double vec __attribute__((vector_size(16))); vec g(vec a,vec b){ return (a<0)?a:b; } I am compiling with -O3, and the most interesting pass is ce1 with noce_try_cmove. Using -msse2, both functions generate similar code andpd %xmm2, %xmm0 andnpd %xmm1, %xmm2 orpd %xmm2, %xmm0 With -mxop they also generate similar code vpcmov %xmm2, %xmm1, %xmm0, %xmm0 However, with -msse4, they differ, the vector version gets blendvpd %xmm0, %xmm2, %xmm1 while the scalar version is stuck with the SSE2 and+andn+or. Is there a particular reason for this inconsistency?