On Tue, Jan 8, 2019 at 10:27 AM Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Jan 08, 2019 at 08:29:03AM +0100, Uros Bizjak wrote: > > Is there a reason stack registers are excluded? Before stackreg pass, > > these registers are just like other hard registers. > > I was just afraid of those, after stack pass removing a store but not > load would result in the stack getting out of sync, and I wasn't sure if the > stack pass will tolerate a RA chosen stack reg not being touched anymore.
It does. There is a stack compensating code that handles these situations. > Also, doesn't loading a float/double/long double into a stack reg and > storing again canonicalize it in any way? Say NaNs? Or unnormal or > pseudo-denormal long doubles etc.? Tried that now with a sNaN and the > load/store didn't change it, but haven't tried with exceptions enabled to > see if it would raise one. FLD from memory in SF and DFmode is considered a conversion, and converts sNaN to NaN (and emits #IA exception). But sNaN handling is already busted in the compiler as RA is free to spill the register in non-XFmode. IMO, the peephole2 pattern is no worse than the current situation. > If that is fine, I'll leave that out. I think we are good to do so. > Looking around, I was using a wrong macro anyway and am surprised nothing > complained, it should have been STACK_REG_P rather than STACK_REGNO_P. > > > Other that that, there is no need for REG_P predicate; after reload we > > don't have subregs and register_operand will match only hard regs. > > Ok, I wasn't sure because "register_operand" allows (subreg (reg)) even if > reload_completed, just disallows subregs of mem after that. At least for x86, there are no SUBREGs after reload, otherwise other parts of the compiler would break. > > Also, please put peep2_reg_dead_p predicate in the pattern predicate. > > I don't see how, that would mean I'd have to write two peephole2s instead of > one. It tries to deal with two different cases, one is where the temporary > reg is dead, in that case we can optimize away both the load or store, the > second case is where the temporary reg isn't dead, in that case we can > optimize away the store, but not the load. With the optimizing away of both > load and store I was just trying to do a cheap DCE there. I didn't realize this is an optimization, a comment would be welcome here. Uros. > Looking around more, I actually think I need to replace > (match_dup 1) with (match_operand 2 "memory_operand"), add rtx_equal_p > (operands[1], operands[2]) and !MEM_VOLATILE_P (operands[2]), because > apparently rtx_equal_p doesn't check the MEM_VOLATILE_P bit. > > > > 2019-01-07 Jakub Jelinek <ja...@redhat.com> > > > > > > PR rtl-optimization/79593 > > > * config/i386/i386.md (reg = mem; mem = reg): New > > > define_peephole2. > > > > > > --- gcc/config/i386/i386.md.jj 2019-01-01 12:37:31.564738571 +0100 > > > +++ gcc/config/i386/i386.md 2019-01-07 17:11:21.056392168 +0100 > > > @@ -18740,6 +18740,21 @@ (define_peephole2 > > > const0_rtx); > > > }) > > > > > > +;; Attempt to optimize away memory stores of values the memory already > > > +;; has. See PR79593. > > > +(define_peephole2 > > > + [(set (match_operand 0 "register_operand") > > > + (match_operand 1 "memory_operand")) > > > + (set (match_dup 1) (match_dup 0))] > > > + "REG_P (operands[0]) > > > + && !STACK_REGNO_P (operands[0]) > > > + && !MEM_VOLATILE_P (operands[1])" > > > + [(set (match_dup 0) (match_dup 1))] > > > +{ > > > + if (peep2_reg_dead_p (1, operands[0])) > > > + DONE; > > > +}) > > > + > > > ;; Attempt to always use XOR for zeroing registers (including FP modes). > > > (define_peephole2 > > > [(set (match_operand 0 "general_reg_operand") > > Jakub