Hi Roger,
On 15/01/18 14:58, Roger Sayle wrote:
I was hoping I could ask an ARM backend maintainer to look over the
following patch.
I was examining the code generated for the following C snippet on a
raspberry pi,
static inline int popcount_lut8(unsigned *buf, int n)
{
int cnt=0;
unsigned int i;
do {
i = *buf;
cnt += lut[i&255];
cnt += lut[i>>8&255];
cnt += lut[i>>16&255];
cnt += lut[i>>24];
buf++;
} while(--n);
return cnt;
}
and was surprised to see following instruction sequence generated by the
compiler:
mov r5, r2, lsr #8
uxtb r5, r5
This sequence can be performed by a single ARM instruction:
uxtb r5, r2, ror #8
I agree.
The attached patch allows GCC's combine pass to take advantage of the ARM's
uxtb with
rotate functionality to implement the above zero_extract, and likewise to
use the sxtb
with rotate to implement sign_extract. ARM's uxtb and sxtb can only be used
with rotates
of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a
nop, and extends
with ror #24 can be implemented using regular shifts], so the approach here
is to add the
six missing but useful instructions as 6 different define_insn in arm.md,
rather than try to
be clever with new predicates.
Alas, later ARM hardware has advanced bit field instructions, and earlier
ARM cores
didn't support extend-with-rotate, so this appears to only benefit armv6 era
CPUs.
Right, the deal with these instructions is that for ARM state they appear
from ARMv6 onwards and for Thumb they appear from ARMv6T2.
However, in ARMv6T2 we also get the more advanced UBFX instructions and we
might as well use those (and we do). So indeed the UXTB+ROR instructions
are only useful for ARMv6 ARM-state, which is what your patch correctly
specifies.
The following patch has been minimally tested by building cc1 of a
cross-compiler
and confirming the desired instructions appear in the assembly output for
the test
case. Alas, my minimal raspberry pi hardware is unlikely to be able to
bootstrap gcc
or run the testsuite, so I'm hoping a ARM expert can check (and confirm)
whether this
change is safe and suitable. [Thanks in advance and apologies for any
inconvenience].
I've bootstrapped and tested your patch on arm-none-linux-gnueabihf
so the patch looks good testing-wise.
I only have a few minor comments inline.
With those addressed the patch is ok.
That being said, GCC is now in stage 4: regression fixes only.
You'll have to wait until GCC 8 is released and stage 1 development
is reopened before committing this patch.
Thanks,
Kyrill
2018-01-14 Roger Sayle <ro...@nextmovesoftware.com>
* config/arm/arm.md (*arm_zeroextractsi2_8_8,
*arm_signextractsi2_8_8,
*arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16,
*arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New.
2018-01-14 Roger Sayle <ro...@nextmovesoftware.com>
* gcc.target/arm/extend-ror.c: New test.
Index: gcc/gcc/config/arm/arm.md
===================================================================
--- gcc/gcc/config/arm/arm.md (revision 256667)
+++ gcc/gcc/config/arm/arm.md (working copy)
@@ -11684,6 +11684,72 @@
""
)
+;;;; Implement zero_extract using uxtb/uxth instruction with
+;;;; the ror #N qualifier when applicable.
Please use only two ";;" as comments.
<in the testcase>
/* { dg-do compile } */
/* { dg-options "-O -march=armv6" } */
/* { dg-prune-output "switch .* conflicts with" } */
These directives can be tricky to get right for all the multilib
variations that folks test. I would recommend you use something like:
/* { dg-do compile } */
/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } {
"-march=armv6" } } */
/* { dg-require-effective-target arm_arm_ok } */
/* { dg-add-options arm_arch_v6 } */
/* { dg-additional-options "-O -marm" } */
That way it properly adds -march=armv6 -marm and skips the test gracefully if
the
user tries to force an incompatible set of options.