https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70721
Bug ID: 70721 Summary: Suboptimal code generated when using _mm_min_sd Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: kirill.yukhin at intel dot com Target Milestone: --- _mm_min_sd is implemented as (define_insn "<sse>_vm<code><mode>3<round_saeonly_name>" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 (smaxmin:VF_128 (match_operand:VF_128 1 "register_operand" "0,v") (match_operand:VF_128 2 "vector_operand" "xBm,<round_saeonly_constraint>")) (match_dup 1) (const_int 1)))] The problem is smaxmin is applied to the full 128-bit operand. Can we change it to apply only to the first 64-bit of operand so that we can remove 2 xmm moves in --- #include <emmintrin.h> double __attribute ((noinline, noclone)) foo (double a, double b) { __m128d x = _mm_set_sd(a); __m128d y = _mm_set_sd(b); return _mm_cvtsd_f64(_mm_min_sd(x, y)); } ---