On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
> > > However there are 2 issues with this, one is the spurious subreg,
> >
> > Combine didn't make that up out of thin air; something already used
> > DImode here. It could simplify it to SImode in this case, that is
> > true, don't know why it doesn't; it isn't necessarily faster code to
> > do so, it can be slower, it might not match, etc.
>
> The relevant RTL instructions on AArch64 are:
[ You never gave a full test case, or I missed it, or cannot find it
anymore -- but I can reproduce this now:
void g(void);
void f(int *x) { if (*x & 2) g(); }
]
> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
> (and:SI (reg/v:SI 76 [ xD.2641 ])
> (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
> (nil))
> (insn 26 25 27 2 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 77 [ D.2705 ])
> (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
> (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
> (nil)))
>
> I don't see anything using DI...
Yeah, I spoke too soon, sorry. It looks like make_compound_operation came
up with it.
> > It's only a problem for AND-and-compare, no?
>
> Yes, so it looks like some other backends match the odd pattern and then have
> another
> pattern change it back into the canonical AND/TST form during the split phase
> (maybe
> the subreg confuses register allocation or block other optimizations).
A subreg of a pseudo is not anything special, don't worry about it,
register_operand and similar treat it just like any other register.
> This all seems
> a lot of unnecessary complexity for a few special immediates when there is a
> much
> simpler solution...
Feel free to post a patch! I would love to have this all simplified.
> > > But there are more efficient ways to emit single bit and masks tests that
> > > apply
> > > to most CPUs rather than doing something specific that works for just one
> > > target
> > > only. For example single bit test is a simple shift into carry flag or
> > > into the
> > > sign bit, and for mask tests, you shift out all the non-mask bits.
> >
> > Most of those are quite target-specific. Some others are already done,
> > and/or done by other passes.
>
> But what combine does here is even more target-specific.
Combine puts everything (well, most things) through
make_compound_operation, on all targets.
> > Combine converts the merged instructions to what it thinks is the
> > canonical or cheapest form, and uses that. It does not try multiple
> > options (the zero_ext* -> and+shift rewriting is not changing the
> > semantics of the pattern at all).
>
> But the change from AND to zero_extract is already changing semantics...
Oh? It is not supposed to!
> > > Or would it be better to let each target decide
> > > on how to canonicalize bit tests and only try that alternative?
> >
> > The question is how to write the pattern to be most convenient for all
> > targets.
>
> The obvious choice is to try the 2 original instructions merged.
... without any simplification. Yes, I've wanted combine to fall back
to that if the "simplified" version does not work out. Not so easy to
do though.
> > > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
> >
> > Combine does not try multiple options.
>
> I'm not following - combine tries zero_extract and shift+AND - that's 2
> options.
> If that is feasible then adding a 3rd option should be possible.
The shift+and is *exactly the same* as the zero_extract, just written
differently.
> We certainly need a lot more target hooks in general so GCC can do the right
> thing
> (rather than using costs inconsistently all over the place). But that's a
> different
> discussion...
This isn't about costs though. That is a big other can of worms, indeed!
Anyway. In that testcase I made, everything is simplified just fine on
aarch64, using *tbeqdi1; what am I missing?
Segher