https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70873

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|2016-04-29 00:00:00         |2016-05-03
     Ever confirmed|0                           |1

--- Comment #16 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Uroš Bizjak from comment #15)
> (In reply to H.J. Lu from comment #14)
> 
> > We need to disable
> > 
> > define_split
> >   [(set (match_operand 0 "any_fp_register_operand")
> >         (float_extend (match_operand 1 "memory_operand")))]
> >   "reload_completed
> >    && (GET_MODE (operands[0]) == TFmode
> >        || GET_MODE (operands[0]) == XFmode
> >        || GET_MODE (operands[0]) == DFmode)"
> >   [(set (match_dup 0) (match_dup 2))]
> > {
> > 
> > for SF->DF.
> Why? This splitter will eventually result in a move of 0.0 to a SSE register.

This splitter is placed before the one we want.  We have quite
a few similar splitters far apart and we lose the track.  This
patch:

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 940dc20..dc46b16 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3615,6 +3615,35 @@
     FAIL;
 })

+;; Break partial reg stall for cvtss2sd.
+
+(define_split
+  [(set (match_operand:DF 0 "register_operand")
+        (float_extend:DF
+          (match_operand:SF 1 "nonimmediate_operand")))]
+  "TARGET_SSE2 && TARGET_SSE_MATH
+   && TARGET_SSE_PARTIAL_REG_DEPENDENCY
+   && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && SSE_REG_P (operands[0])
+   && (!SSE_REG_P (operands[1])
+       || REGNO (operands[0]) != REGNO (operands[1]))
+   && (!EXT_REX_SSE_REG_P (operands[0])
+       || TARGET_AVX512VL)"
+  [(set (match_dup 0)
+        (vec_merge:V2DF
+          (float_extend:V2DF
+            (vec_select:V2SF
+              (match_dup 1)
+              (parallel [(const_int 0) (const_int 1)])))
+          (match_dup 0)
+          (const_int 1)))]
+{
+  operands[0] = lowpart_subreg (V2DFmode, operands[0], DFmode);
+  operands[1] = lowpart_subreg (V4SFmode, operands[1], SFmode);
+  emit_move_insn (operands[0], CONST0_RTX (V2DFmode));
+})
+
 (define_split
   [(set (match_operand 0 "any_fp_register_operand")
        (float_extend (match_operand 1 "memory_operand")))]

works for me.

Reply via email to