On Tue, Jan 8, 2019 at 10:27 AM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Tue, Jan 08, 2019 at 08:29:03AM +0100, Uros Bizjak wrote:
> > Is there a reason stack registers are excluded? Before stackreg pass,
> > these registers are just like other hard registers.
>
> I was just afraid of those, after stack pass removing a store but not
> load would result in the stack getting out of sync, and I wasn't sure if the
> stack pass will tolerate a RA chosen stack reg not being touched anymore.

It does. There is a stack compensating code that handles these situations.

> Also, doesn't loading a float/double/long double into a stack reg and
> storing again canonicalize it in any way?  Say NaNs?  Or unnormal or
> pseudo-denormal long doubles etc.?  Tried that now with a sNaN and the
> load/store didn't change it, but haven't tried with exceptions enabled to
> see if it would raise one.

FLD from memory in SF and DFmode is considered a conversion, and
converts sNaN to NaN (and emits #IA exception). But sNaN handling is
already busted in the compiler as RA is free to spill the register in
non-XFmode. IMO, the peephole2 pattern is no worse than the current
situation.

> If that is fine, I'll leave that out.

I think we are good to do so.

> Looking around, I was using a wrong macro anyway and am surprised nothing
> complained, it should have been STACK_REG_P rather than STACK_REGNO_P.
>
> > Other that that, there is no need for REG_P predicate; after reload we
> > don't have subregs and register_operand will match only hard regs.
>
> Ok, I wasn't sure because "register_operand" allows (subreg (reg)) even if
> reload_completed, just disallows subregs of mem after that.

At least for x86, there are no SUBREGs after reload, otherwise other
parts of the compiler would break.

> > Also, please put peep2_reg_dead_p predicate in the pattern predicate.
>
> I don't see how, that would mean I'd have to write two peephole2s instead of
> one.  It tries to deal with two different cases, one is where the temporary
> reg is dead, in that case we can optimize away both the load or store, the
> second case is where the temporary reg isn't dead, in that case we can
> optimize away the store, but not the load.  With the optimizing away of both
> load and store I was just trying to do a cheap DCE there.

I didn't realize this is an optimization, a comment would be welcome here.

Uros.

> Looking around more, I actually think I need to replace
> (match_dup 1) with (match_operand 2 "memory_operand"), add rtx_equal_p
> (operands[1], operands[2]) and !MEM_VOLATILE_P (operands[2]), because
> apparently rtx_equal_p doesn't check the MEM_VOLATILE_P bit.
>
> > > 2019-01-07  Jakub Jelinek  <ja...@redhat.com>
> > >
> > >         PR rtl-optimization/79593
> > >         * config/i386/i386.md (reg = mem; mem = reg): New 
> > > define_peephole2.
> > >
> > > --- gcc/config/i386/i386.md.jj  2019-01-01 12:37:31.564738571 +0100
> > > +++ gcc/config/i386/i386.md     2019-01-07 17:11:21.056392168 +0100
> > > @@ -18740,6 +18740,21 @@ (define_peephole2
> > >                        const0_rtx);
> > >  })
> > >
> > > +;; Attempt to optimize away memory stores of values the memory already
> > > +;; has.  See PR79593.
> > > +(define_peephole2
> > > +  [(set (match_operand 0 "register_operand")
> > > +        (match_operand 1 "memory_operand"))
> > > +   (set (match_dup 1) (match_dup 0))]
> > > +  "REG_P (operands[0])
> > > +   && !STACK_REGNO_P (operands[0])
> > > +   && !MEM_VOLATILE_P (operands[1])"
> > > +  [(set (match_dup 0) (match_dup 1))]
> > > +{
> > > +  if (peep2_reg_dead_p (1, operands[0]))
> > > +    DONE;
> > > +})
> > > +
> > >  ;; Attempt to always use XOR for zeroing registers (including FP modes).
> > >  (define_peephole2
> > >    [(set (match_operand 0 "general_reg_operand")
>
>         Jakub

Reply via email to