On Tue, Jan 08, 2019 at 11:49:10AM +0100, Uros Bizjak wrote: > FLD from memory in SF and DFmode is considered a conversion, and > converts sNaN to NaN (and emits #IA exception). But sNaN handling is > already busted in the compiler as RA is free to spill the register in > non-XFmode. IMO, the peephole2 pattern is no worse than the current > situation.
Ok. > At least for x86, there are no SUBREGs after reload, otherwise other > parts of the compiler would break. The new patch would really handle even a SUBREG there... > > I don't see how, that would mean I'd have to write two peephole2s instead of > > one. It tries to deal with two different cases, one is where the temporary > > reg is dead, in that case we can optimize away both the load or store, the > > second case is where the temporary reg isn't dead, in that case we can > > optimize away the store, but not the load. With the optimizing away of both > > load and store I was just trying to do a cheap DCE there. > > I didn't realize this is an optimization, a comment would be welcome here. Ugh, except that it doesn't work. peep2_reg_dead_p (1, operands[0]) is not what I meant, that is always false, as the register must be live in between the first and second instruction. I meant peep2_reg_dead_p (2, operands[0]), the register dead at the end of the second instruction, except we don't really support define_split/define_peephole2 splitting into zero instructions, DONE; in that case returns NULL like FAIL; does. So, let's just wait for DCE to finish it up. Here is what I'll bootstrap/regtest then. Added also reg_overlap_mentioned_p, in case there is e.g. movl (%eax,%edx), %eax movl %eax, (%eax,%edx) or similar and as I said earlier, explicit match_operand so that I can check MEM_VOLATILE_P on both MEMs. 2019-01-08 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/79593 * config/i386/i386.md (reg = mem; mem = reg): New define_peephole2. --- gcc/config/i386/i386.md.jj 2019-01-07 23:54:54.494800693 +0100 +++ gcc/config/i386/i386.md 2019-01-08 12:34:18.916832780 +0100 @@ -18740,6 +18740,18 @@ (define_peephole2 const0_rtx); }) +;; Attempt to optimize away memory stores of values the memory already +;; has. See PR79593. +(define_peephole2 + [(set (match_operand 0 "register_operand") + (match_operand 1 "memory_operand")) + (set (match_operand 2 "memory_operand") (match_dup 0))] + "!MEM_VOLATILE_P (operands[1]) + && !MEM_VOLATILE_P (operands[2]) + && rtx_equal_p (operands[1], operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 1))]) + ;; Attempt to always use XOR for zeroing registers (including FP modes). (define_peephole2 [(set (match_operand 0 "general_reg_operand") Jakub