https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70998
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> --- sse2_cvtsd2ss<round_name> pattern is wrong. This pattern is written as: (define_insn "sse2_cvtsd2ss<round_name>" [(set (match_operand:V4SF 0 "register_operand" "=x,x,v") (vec_merge:V4SF (vec_duplicate:V4SF (float_truncate:V2SF (match_operand:V2DF 2 "nonimmediate_operand" "x,m,<round_constraint>"))) (match_operand:V4SF 1 "register_operand" "0,0,v") (const_int 1)))] This implies V2DF load from memory, which is not the case. The pattern should be similar to e.g. cvtsi2ss pattern: (define_insn "sse_cvtsi2ss<round_name>" [(set (match_operand:V4SF 0 "register_operand" "=x,x,v") (vec_merge:V4SF (vec_duplicate:V4SF (float:SF (match_operand:SI 2 "<round_nimm_scalar_predicate>" "r,m,<round_constraint3>"))) (match_operand:V4SF 1 "register_operand" "0,0,v") (const_int 1)))] This is correct scalar memory load.