> -----Original Message----- > From: [email protected] [mailto:gcc-patches- > [email protected]] On Behalf Of Jeff Law > Sent: Tuesday, December 02, 2014 6:11 AM > To: Zhenqiang Chen > Cc: Steven Bosscher; [email protected]; Jakub Jelinek > Subject: Re: [PATCH] Fix PR 61225 > > On 08/04/14 02:24, Zhenqiang Chen wrote: > > >>> > >>> ChangeLog: > >>> 2014-05-22 Zhenqiang Chen <[email protected]> > >>> > >>> Part of PR rtl-optimization/61225 > >>> * config/i386/i386-protos.h (ix86_peephole2_rtx_equal_p): > >>> New proto. > >>> * config/i386/i386.c (ix86_peephole2_rtx_equal_p): New function. > >>> * regcprop.c (replace_oldest_value_reg): Add REG_EQUAL note > when > >>> propagating to SET. > >> > >> I can't help but wonder why the new 4 insn combination code isn't > >> presenting this as a nice big fat insn to the x86 backend which would > >> eliminate the need for the peep2. > >> > >> But, assuming there's a fundamental reason why that's not kicking in... > > > > Current combine pass can only handle > > > > I0 -> I1 -> I2 -> I3. > > I0, I1 -> I2, I2 -> I3. > > I0 -> I2; I1, I2 -> I3. > > I0 -> I1; I1, I2 -> I3. > > > > For the case, it is > > I1 -> I2 -> I3; I2 -> INSN > > > > I3 and INSN looks like not related. But INSN is a COMPARE to set CC > > and I3 can also set CC. I3 and INSN can be combined together as one > > instruction to set CC. > Presumably there's no dataflow between I3 and INSN because they both set > CC (doesn't that make them anti-dependent? > > Can you show me the RTL corresponding to I1, I2, I3 and INSN, I simply find it > easier to look at RTL rather than guess why we don't have the appropriate > linkage and thus not attempting the combinations we want. > > Pseudo code for the resulting I3 and INSN would help -- as I work through > this there's some inconsistencies in how I'm interpreting a few things and RTL > and pseudo-rtl for the desired output RTL would help a lot.
C code:
if (!--*p)
rtl code:
6: r91:SI=[r90:SI]
7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
8: [r90:SI]=r88:SI
9: flags:CCZ=cmp(r88:SI,0)
expected output:
8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}
in assemble, it is
decl (%eax)
> >
> > ChangeLog
> > 2014-08-04 Zhenqiang Chen <[email protected]>
> >
> > Part of PR rtl-optimization/61225
> > * combine.c (refer_same_reg_p): New function.
> > (combine_instructions): Handle I1 -> I2 -> I3; I2 -> insn.
> > (try_combine): Add one more parameter TO_COMBINED_INSN, which
> is
> > used to create a new insn parallel (TO_COMBINED_INSN, I3).
> >
> > testsuite/ChangeLog:
> > 2014-08-04 Zhenqiang Chen <[email protected]>
> >
> > * gcc.target/i386/pr61225.c: New test.
> >
> > diff --git a/gcc/combine.c b/gcc/combine.c index 53ac1d6..42098ab
> > 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -412,7 +412,7 @@ static int cant_combine_insn_p (rtx);
> > static int can_combine_p (rtx, rtx, rtx, rtx, rtx, rtx, rtx *, rtx *);
> > static int combinable_i3pat (rtx, rtx *, rtx, rtx, rtx, int, int, rtx
*);
> > static int contains_muldiv (rtx);
> > -static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx);
> > +static rtx try_combine (rtx, rtx, rtx, rtx, int *, rtx, rtx);
> > static void undo_all (void);
> > static void undo_commit (void);
> > static rtx *find_split_point (rtx *, rtx, bool); @@ -1099,6 +1099,46
> > @@ insn_a_feeds_b (rtx a, rtx b)
> > #endif
> > return false;
> > }
> > +
> > +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> > + It returns TRUE, if reg1 == reg2, and no other refer of reg1
> > + except A and B. */
> > +
> > +static bool
> > +refer_same_reg_p (rtx a, rtx b)
> > +{
> > + rtx seta = single_set (a);
> > + rtx setb = single_set (b);
> > +
> > + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> > + || !seta || !setb)
> > + return false;
> Go ahead and use
> || !setb
> || !setb
>
> It's a bit more vertical space, but I believe closer in line with our
coding
> standards.
Updated.
> > +
> > + if (GET_CODE (SET_SRC (seta)) != COMPARE
> > + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
> > + || !REG_P (XEXP (SET_SRC (seta), 0))
> > + || !const0_rtx
> > + || !REG_P (SET_SRC (setb))
> > + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
> > + return false;
> What's the !const0_rtx test here? Don't you want to test some object
> from SETA against const0_rtx? Also note that you may need to test
> against CONST0_RTX (mode)
It's my fault. It should be
XEXP (SET_SRC (seta), 1) != const0_rtx
The updated patch is attached.
Thanks!
-Zhenqiang
> > @@ -1431,6 +1468,50 @@ combine_instructions (rtx f, unsigned int nregs)
> > }
> > }
> >
> > + /* Try to combine a compare insn that sets CC
> > + with a preceding insn that can set CC, and maybe with its
> > + logical predecessor as well.
> > + We need this special code because data flow connections
> > + do not get entered in LOG_LINKS. */
> So you'd want to be more specific about what dataflow connections are
> not in the LOG_LINKS that we want.
>
> It feels to me like we're missing the anti-dependence links on CC and
> that there's a general aspect to combine missing here. But I want to
> hold off on final judgement until I know more.
>
> I also wonder if compare-elim ought to be helping here. Isn't that the
> point here, to eliminate the comparison and instead get it for free as
> part of the arithmetic? If so, is it the fact that we have memory
> references that prevents compare-elim from kicking in?
>
> jeff
>
pr61225.patch
Description: Binary data
