Hi Roger,
Do you want to say bmsk_s instead of msk_s here:
+/* { dg-final { scan-assembler "msk_s\\s+r0,r0,0" } } */
Anyhow, the patch looks good. Proceed with your commit.
Thank you,
Claudiu
On Mon, Oct 30, 2023 at 5:05 AM Jeff Law <[email protected]> wrote:
>
>
>
> On 10/28/23 10:47, Roger Sayle wrote:
> >
> > This patch optimizes PR middle-end/101955 for the ARC backend. On ARC
> > CPUs with a barrel shifter, using two shifts is (probably) optimal as:
> >
> > asl_s r0,r0,31
> > asr_s r0,r0,31
> >
> > but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
> >
> > and r2,r0,1
> > ror r2,r2
> > add.f 0,r2,r2
> > sbc r0,r0,r0
> >
> > with this patch, we now generate the smaller, faster and non-flags
> > clobbering:
> >
> > bmsk_s r0,r0,0
> > neg_s r0,r0
> >
> > Tested with a cross-compiler to arc-linux hosted on x86_64,
> > with no new (compile-only) regressions from make -k check.
> > Ok for mainline if this passes Claudiu's nightly testing?
> >
> >
> > 2023-10-28 Roger Sayle <[email protected]>
> >
> > gcc/ChangeLog
> > PR middle-end/101955
> > * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
> > to convert sign extract of the least significant bit into an
> > AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
> >
> > gcc/testsuite/ChangeLog
> > PR middle-end/101955
> > * gcc.target/arc/pr101955.c: New test case.
> Good catch. Looking to do something very similar on the H8 based on
> your work here.
>
> One the H8 we can use bld to load a bit from an 8 bit register into the
> C flag. Then we use subtract with carry to get an 8 bit 0/-1 which we
> can then sign extend to 16 or 32 bits. That covers bit positions 0..15
> of an SImode input.
>
> For bits 16..31 we can move the high half into the low half, the use the
> bld sequence.
>
> For bit zero the and+neg is the same number of clocks and size as bld
> based sequence. But it'll simulate faster, so it's special cased.
>
>
> Jeff
>