https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104674

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, when emitting the __divmoddi4 call, expand_DIVMOD ->
ix86_expand_divmod_libfunc calls
assign_386_stack_local (E_DImode, SLOT_TEMP)
to obtain a temporary stack slot for the remainder.
(mem:DI (plus:SI (frame) (const_int -8)))
is what is returned and the IL looks reasonable e.g. in vregs:
(insn 12 6 13 2 (parallel [
            (set (reg:SI 97)
                (plus:SI (reg/f:SI 19 frame)
                    (const_int -8 [0xfffffffffffffff8])))
            (clobber (reg:CC 17 flags))
        ]) 229 {*addsi_1}
     (nil))
...
(insn 19 18 20 2 (set (reg:DI 89 [ divmod_tmp_15 ])
        (reg:DI 0 ax)) 80 {*movdi_internal}
     (nil))
(insn 20 19 21 2 (set (reg:DI 90 [ divmod_tmp_15+8 ])
        (mem/c:DI (plus:SI (reg/f:SI 19 frame)
                (const_int -8 [0xfffffffffffffff8])) [0  S8 A64])) 80
{*movdi_internal}
     (nil))
...
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
        (float:DF (reg:DI 89 [ divmod_tmp_15 ]))) "pr104674.c":8:10 214
{*floatdidf2_i387}
     (nil))
...
(insn 30 29 31 2 (set (reg:DF 98)
        (float:DF (reg:SI 104 [ divmod_tmp_15+8 ]))) "pr104674.c":9:14 207
{*floatsidf2}
     (expr_list:REG_DEAD (reg:SI 104 [ divmod_tmp_15+8 ])
        (nil)))
i.e. it first loads from the temporary slot and only afterwards does some
further operations on the results.
Later on that insn 20 becomes
(insn 67 19 21 2 (set (reg:SI 104 [ divmod_tmp_15+8 ])
        (mem/c:SI (plus:SI (reg/f:SI 19 frame)
                (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])) 81
{*movsi_internal}
     (nil))
but it is still ok.  Combine propagates that memory load into a later insn
though, so we have:
...
(insn 70 18 19 2 (set (reg:DI 106)
        (reg:DI 0 ax)) -1
     (expr_list:REG_DEAD (reg:DI 0 ax)
        (nil)))
...
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
        (float:DF (reg:DI 106))) "pr104674.c":8:10 214 {*floatdidf2_i387}
     (expr_list:REG_DEAD (reg:DI 106)
        (nil)))
...
(insn 30 29 31 2 (set (reg:DF 98)
        (float:DF (mem/c:SI (plus:SI (reg/f:SI 19 frame)
                    (const_int -8 [0xfffffffffffffff8])) [0  S4 A64])))
"pr104674.c":9:14 207 {*floatsidf2}
     (nil))
i.e. effective it extended the lifetime of the DImode SLOT_TEMP (well, the low
SImode part of it) across insn 25.
But then the split1 pass splits the:
(insn 25 24 26 2 (set (reg/v:DF 87 [ s ])
        (float:DF (reg:DI 106))) "pr104674.c":8:10 214 {*floatdidf2_i387}
     (expr_list:REG_DEAD (reg:DI 106)
        (nil)))
insn into:
(insn 72 24 26 2 (parallel [
            (set (reg/v:DF 87 [ s ])
                (float:DF (reg:DI 106)))
            (clobber (mem/c:DI (plus:SI (reg/f:SI 19 frame)
                        (const_int -8 [0xfffffffffffffff8])) [0  S8 A64]))
            (clobber (scratch:V4SI))
            (clobber (scratch:V4SI))
        ]) "pr104674.c":8:10 -1
     (nil))
and uses there assign_386_stack_local (E_DImode, SLOT_TEMP) which returns
the same temporary slot which is unfortunately live across that instruction:
;; Avoid store forwarding (partial memory) stall penalty
;; by passing DImode value through XMM registers.  */

(define_split
  [(set (match_operand:X87MODEF 0 "register_operand")
        (float:X87MODEF
          (match_operand:DI 1 "register_operand")))]
  "!TARGET_64BIT && TARGET_INTER_UNIT_MOVES_TO_VEC
   && TARGET_80387 && X87_ENABLE_FLOAT (<X87MODEF:MODE>mode, DImode)
   && TARGET_SSE2 && optimize_function_for_speed_p (cfun)
   && can_create_pseudo_p ()"
  [(const_int 0)]
{
  emit_insn (gen_floatdi<mode>2_i387_with_xmm
             (operands[0], operands[1],
              assign_386_stack_local (DImode, SLOT_TEMP)));
  DONE;
})

>From what I can see, SLOT_TEMP is used in:
i386.md:              assign_386_stack_local (DImode, SLOT_TEMP)));
i386.md:                   assign_386_stack_local (DImode, SLOT_TEMP)));
sync.md:                assign_386_stack_local (DImode, SLOT_TEMP)));
sync.md:                  assign_386_stack_local (DImode, SLOT_TEMP)));
i386-expand.cc:      target = assign_386_stack_local (SImode, SLOT_TEMP);
i386-expand.cc:      target = assign_386_stack_local (SImode, SLOT_TEMP);
i386-expand.cc:  rtx rem = assign_386_stack_local (mode, SLOT_TEMP);
and except for this define_split, all other uses are either in some builtin's
expansion or in define_expand, those look good, but in this define_split, I
think it can't guarantee that SLOT_TEMP isn't live across the insn being split.
so we need to use a different SLOT_* kind there.

Reply via email to