> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Monday, September 26, 2022 1:43 PM
> To: Tamar Christina <[email protected]>
> Cc: [email protected]; nd <[email protected]>; [email protected];
> Richard Sandiford <[email protected]>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
>
> On Mon, 26 Sep 2022, Richard Biener wrote:
>
> > On Mon, 26 Sep 2022, Tamar Christina wrote:
> >
> > > > Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> > >
> > > But then I'd still need to change the expansion code. I suppose this could
> prevent the issue with changes to code on other targets.
> > >
> > > > > > We have undocumented addcc, negcc, etc. patterns, should we
> have aandcc pattern for this indicating support for andcc + jump as
> opposedto cmpcc + jump?
> > > > >
> > > > > This could work yeah. I didn't know these existed.
> > >
> > > > Ah, so they are conditional add, not add setting CC, so andcc
> > > > wouldn't be appropriate.
> > >
> > > > So I'm not sure how we'd handle such situation - maybe looking at
> > > > REG_DECL and recognizing a _Bool PARM_DECL is OK?
> > >
> > > I have a slight suspicion that Richard Sandiford would likely reject this
> though.. The additional AND seemed less hacky as it's just communicating
> range.
> > >
> > > I still need to also figure out which representation of bool is being
> > > used,
> because only the 0-1 variant works. Is there a way to check that?
> >
> > So another option would be, in case you have (subreg:SI (reg:QI)), if
> > we expand
> >
> > if (b != 0)
> >
> > expand that to
> >
> > !((b & 255) == 0)
> >
> > basically invert the comparison and the leverage the paradoxical
> > subreg to specify a narrower immediate to AND with? Just hoping that
> > arm can do 255 as immediate and still efficiently handle this?
We can and already do, and don't need that representation to do so.
The problem is, handling 255 is already inefficient. It requires us to use an
additional
Instruction to test the value. Whereas we have a fused test single bit and
branch instruction.
> >
> > Wouldn't this transform be possible in combine with the appropriate
> > backend pattern and combine synthesizing the and for paradoxical
> subregs?
Not unless we have enough range information in RTL to know that whatever value
has
been fed into the cbranch has a range of 1 bit. A range of 8 bits we already
have and isn't value useful.
The idea was to transform what we currently have:
tst w0, 255
bne .L4
ret
i.e. test the bottom 8 bits, into
tbnz w0, #0, .L4
ret
i.e. test only bit 0 and branch based on that bit. We cannot do this when all
we know is that the range is 8 bits.
>
> Looking at what we produce on aarch64 it seems 'bool' is using an SImode
> register but your characterization that the upper 24 bits have undefined
> content suggests that is a wrong representation?
> If the ABI doesn't say anything about the upper bits we should reflect that
> somehow?
It does. And no "bool" is using QImode. The expansion of
extern void h ();
void g1(bool x)
{
if (__builtin_expect (x, 0))
h ();
}
Shows that the argument x is passed as a QI mode, but like many RISC targets
(and even i386) we promote the argument during expansion:
(insn 2 4 3 2 (set (reg/v:SI 92 [ x ])
(zero_extend:SI (reg:QI 0 x0 [ x ]))) "/app/example.cpp":4:1 -1
(nil))
But the value is passed as QImode.
We use this fact to know that the range is 8 bits in the cbanch instruction.
If no operation was done that requires a bigger
range then combine will push the zero extend into the cbranch and we have
various patterns to handle different forms of this.
For instance:
void g1(bool *x)
{
if (__builtin_expect (*x, 0))
h ();
}
Because of the load of x we generate:
ldrb w0, [x0]
cbnz w0, .L7
ret
because we know the top bits are defined to 0 in this case and can just test
the entire register.
The reason for this promotion for us and many other backends is one of
efficiency. If we don't promote to something
we have native instructions for we would have to promote and demote the value
at *every* instruction in RTL.
This causes significant noise in the RTL. So we can't do anything different
here. I have plans to try to fix this, but not in GCC 13.
But even then it won't help with this case, because we explicitly need to know
that the range is a single bit. Not 8 bits.
Regards,
Tamar
>
> Richard.