https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed|2016-04-29 00:00:00 |2016-05-03
Ever confirmed|0 |1
--- Comment #16 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Uroš Bizjak from comment #15)
> (In reply to H.J. Lu from comment #14)
>
> > We need to disable
> >
> > define_split
> > [(set (match_operand 0 "any_fp_register_operand")
> > (float_extend (match_operand 1 "memory_operand")))]
> > "reload_completed
> > && (GET_MODE (operands[0]) == TFmode
> > || GET_MODE (operands[0]) == XFmode
> > || GET_MODE (operands[0]) == DFmode)"
> > [(set (match_dup 0) (match_dup 2))]
> > {
> >
> > for SF->DF.
> Why? This splitter will eventually result in a move of 0.0 to a SSE register.
This splitter is placed before the one we want. We have quite
a few similar splitters far apart and we lose the track. This
patch:
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 940dc20..dc46b16 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3615,6 +3615,35 @@
FAIL;
})
+;; Break partial reg stall for cvtss2sd.
+
+(define_split
+ [(set (match_operand:DF 0 "register_operand")
+ (float_extend:DF
+ (match_operand:SF 1 "nonimmediate_operand")))]
+ "TARGET_SSE2 && TARGET_SSE_MATH
+ && TARGET_SSE_PARTIAL_REG_DEPENDENCY
+ && epilogue_completed
+ && optimize_function_for_speed_p (cfun)
+ && SSE_REG_P (operands[0])
+ && (!SSE_REG_P (operands[1])
+ || REGNO (operands[0]) != REGNO (operands[1]))
+ && (!EXT_REX_SSE_REG_P (operands[0])
+ || TARGET_AVX512VL)"
+ [(set (match_dup 0)
+ (vec_merge:V2DF
+ (float_extend:V2DF
+ (vec_select:V2SF
+ (match_dup 1)
+ (parallel [(const_int 0) (const_int 1)])))
+ (match_dup 0)
+ (const_int 1)))]
+{
+ operands[0] = lowpart_subreg (V2DFmode, operands[0], DFmode);
+ operands[1] = lowpart_subreg (V4SFmode, operands[1], SFmode);
+ emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
+})
+
(define_split
[(set (match_operand 0 "any_fp_register_operand")
(float_extend (match_operand 1 "memory_operand")))]
works for me.