https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873
H.J. Lu <hjl.tools at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed|2016-04-29 00:00:00 |2016-05-03 Ever confirmed|0 |1 --- Comment #16 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Uroš Bizjak from comment #15) > (In reply to H.J. Lu from comment #14) > > > We need to disable > > > > define_split > > [(set (match_operand 0 "any_fp_register_operand") > > (float_extend (match_operand 1 "memory_operand")))] > > "reload_completed > > && (GET_MODE (operands[0]) == TFmode > > || GET_MODE (operands[0]) == XFmode > > || GET_MODE (operands[0]) == DFmode)" > > [(set (match_dup 0) (match_dup 2))] > > { > > > > for SF->DF. > Why? This splitter will eventually result in a move of 0.0 to a SSE register. This splitter is placed before the one we want. We have quite a few similar splitters far apart and we lose the track. This patch: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 940dc20..dc46b16 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3615,6 +3615,35 @@ FAIL; }) +;; Break partial reg stall for cvtss2sd. + +(define_split + [(set (match_operand:DF 0 "register_operand") + (float_extend:DF + (match_operand:SF 1 "nonimmediate_operand")))] + "TARGET_SSE2 && TARGET_SSE_MATH + && TARGET_SSE_PARTIAL_REG_DEPENDENCY + && epilogue_completed + && optimize_function_for_speed_p (cfun) + && SSE_REG_P (operands[0]) + && (!SSE_REG_P (operands[1]) + || REGNO (operands[0]) != REGNO (operands[1])) + && (!EXT_REX_SSE_REG_P (operands[0]) + || TARGET_AVX512VL)" + [(set (match_dup 0) + (vec_merge:V2DF + (float_extend:V2DF + (vec_select:V2SF + (match_dup 1) + (parallel [(const_int 0) (const_int 1)]))) + (match_dup 0) + (const_int 1)))] +{ + operands[0] = lowpart_subreg (V2DFmode, operands[0], DFmode); + operands[1] = lowpart_subreg (V4SFmode, operands[1], SFmode); + emit_move_insn (operands[0], CONST0_RTX (V2DFmode)); +}) + (define_split [(set (match_operand 0 "any_fp_register_operand") (float_extend (match_operand 1 "memory_operand")))] works for me.