I was hoping I could ask an ARM backend maintainer to look over the following patch.
I was examining the code generated for the following C snippet on a
raspberry pi,
static inline int popcount_lut8(unsigned *buf, int n)
{
int cnt=0;
unsigned int i;
do {
i = *buf;
cnt += lut[i&255];
cnt += lut[i>>8&255];
cnt += lut[i>>16&255];
cnt += lut[i>>24];
buf++;
} while(--n);
return cnt;
}
and was surprised to see following instruction sequence generated by the
compiler:
mov r5, r2, lsr #8
uxtb r5, r5
This sequence can be performed by a single ARM instruction:
uxtb r5, r2, ror #8
The attached patch allows GCC's combine pass to take advantage of the ARM's
uxtb with
rotate functionality to implement the above zero_extract, and likewise to
use the sxtb
with rotate to implement sign_extract. ARM's uxtb and sxtb can only be used
with rotates
of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a
nop, and extends
with ror #24 can be implemented using regular shifts], so the approach here
is to add the
six missing but useful instructions as 6 different define_insn in arm.md,
rather than try to
be clever with new predicates.
Alas, later ARM hardware has advanced bit field instructions, and earlier
ARM cores
didn't support extend-with-rotate, so this appears to only benefit armv6 era
CPUs.
The following patch has been minimally tested by building cc1 of a
cross-compiler
and confirming the desired instructions appear in the assembly output for
the test
case. Alas, my minimal raspberry pi hardware is unlikely to be able to
bootstrap gcc
or run the testsuite, so I'm hoping a ARM expert can check (and confirm)
whether this
change is safe and suitable. [Thanks in advance and apologies for any
inconvenience].
2018-01-14 Roger Sayle <[email protected]>
* config/arm/arm.md (*arm_zeroextractsi2_8_8,
*arm_signextractsi2_8_8,
*arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16,
*arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New.
2018-01-14 Roger Sayle <[email protected]>
* gcc.target/arm/extend-ror.c: New test.
Cheers,
Roger
--
Roger Sayle, PhD.
NextMove Software Limited
Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY
arm_zext.log
Description: Binary data
arm_zext.patch
Description: Binary data
/* { dg-do compile } */
/* { dg-options "-O -march=armv6" } */
/* { dg-prune-output "switch .* conflicts with" } */
unsigned int zeroextractsi2_8_8(unsigned int x)
{
return (unsigned char)(x>>8);
}
unsigned int zeroextractsi2_8_16(unsigned int x)
{
return (unsigned char)(x>>16);
}
unsigned int signextractsi2_8_8(unsigned int x)
{
return (int)(signed char)(x>>8);
}
unsigned int signextractsi2_8_16(unsigned int x)
{
return (int)(signed char)(x>>16);
}
unsigned int zeroextractsi2_16_8(unsigned int x)
{
return (unsigned short)(x>>8);
}
unsigned int signextractsi2_16_8(unsigned int x)
{
return (int)(short)(x>>8);
}
/* { dg-final { scan-assembler-times ", ror #8" 4 } } */
/* { dg-final { scan-assembler-times ", ror #16" 2 } } */
