On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote:
> FLD from memory in SF and DFmode is considered a conversion, and
> converts sNaN to NaN (and emits #IA exception). But sNaN handling is
> already busted in the compiler as RA is free to spill the register in
> non-XFmode. IMO, the peephole2 pattern is no worse than the current
> situation.

Ok.

> At least for x86, there are no SUBREGs after reload, otherwise other
> parts of the compiler would break.

The new patch would really handle even a SUBREG there...

> > I don't see how, that would mean I'd have to write two peephole2s instead of
> > one.  It tries to deal with two different cases, one is where the temporary
> > reg is dead, in that case we can optimize away both the load or store, the
> > second case is where the temporary reg isn't dead, in that case we can
> > optimize away the store, but not the load.  With the optimizing away of both
> > load and store I was just trying to do a cheap DCE there.
> 
> I didn't realize this is an optimization, a comment would be welcome here.

Ugh, except that it doesn't work.  peep2_reg_dead_p (1, operands[0])
is not what I meant, that is always false, as the register must be live in
between the first and second instruction.  I meant
peep2_reg_dead_p (2, operands[0]), the register dead at the end of the
second instruction, except we don't really support
define_split/define_peephole2 splitting into zero instructions, DONE; in
that case returns NULL like FAIL; does.  So, let's just wait for DCE to
finish it up.

Here is what I'll bootstrap/regtest then.  Added also
reg_overlap_mentioned_p, in case there is e.g.
  movl (%eax,%edx), %eax
  movl %eax, (%eax,%edx)
or similar and as I said earlier, explicit match_operand so that I can
check MEM_VOLATILE_P on both MEMs.

2019-01-08  Jakub Jelinek  <ja...@redhat.com>

        PR rtl-optimization/79593
        * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2.

--- gcc/config/i386/i386.md.jj  2019-01-07 23:54:54.494800693 +0100
+++ gcc/config/i386/i386.md     2019-01-08 12:34:18.916832780 +0100
@@ -18740,6 +18740,18 @@ (define_peephole2
                       const0_rtx);
 })
 
+;; Attempt to optimize away memory stores of values the memory already
+;; has.  See PR79593.
+(define_peephole2
+  [(set (match_operand 0 "register_operand")
+        (match_operand 1 "memory_operand"))
+   (set (match_operand 2 "memory_operand") (match_dup 0))]
+  "!MEM_VOLATILE_P (operands[1])
+   && !MEM_VOLATILE_P (operands[2])
+   && rtx_equal_p (operands[1], operands[2])
+   && !reg_overlap_mentioned_p (operands[0], operands[2])"
+  [(set (match_dup 0) (match_dup 1))])
+
 ;; Attempt to always use XOR for zeroing registers (including FP modes).
 (define_peephole2
   [(set (match_operand 0 "general_reg_operand")


        Jakub

Reply via email to