On 15/12/16 11:53, James Greenhalgh wrote:
On Thu, Dec 08, 2016 at 09:35:08AM +0000, Kyrill Tkachov wrote:
Hi all,

In this patch we split X-register UBFX instructions that extract up to the
edge of a W-register into a W-register LSR instruction. So for the example in
the testcase instead of:
UBFX    X0, X0, 24, 8

we'd generate:
LSR     w0, w0, 24

An LSR is a simpler instruction and there's a higher chance that it can be
combined with other instructions.

To do this the patch separates the sign_extract case from the zero_extract
case in the *<optab><mode> ANY_EXTRACT pattern and further splits the
SImode/DImode patterns from the resulting zrero_extract pattern.
The DImode zero_extract pattern then becomes a define_insn_and_split that
splits into a zero_extend of an lshiftrt when the bitposition and width of
the zero_extract add up to 32.

Bootstrapped and tested on aarch64-none-linux-gnu.

Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low
risk.  I'm happy for it to wait for the next release if necessary.
I'm OK with the idea, and I'm OK taking this in for Stage 3, but I'm not
convinced by the implementation.

2016-12-08  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

     * config/aarch64/aarch64.md (*<optab><mode>): Split into...
     (*extv<mode>): ...This...
     (*extzvsi): ...This...
     (*extzvdi:): ... And this.  Add splitting to lshiftrt when possible.
Why do we want to to it this way, rather than simply defining a single
"split" which works in the case you're trying to catch.

i.e. (untested)

(define_split
    [(set (match_operand:DI 0 "register_operand")
        (zero_extract:DI (match_operand:DI 1 "register_operand")
                         (match_operand 2
                           "aarch64_simd_shift_imm_offset_di")
                         (match_operand 3
                           "aarch64_simd_shift_imm_di")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
             1, GET_MODE_BITSIZE (DImode) - 1)
    && (INTVAL (operands[2]) + INTVAL (operands[3]))
        == GET_MODE_BITSIZE (SImode)"
   [(set (match_dup 0)
        (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
   {
     operands[4] = gen_lowpart (SImode, operands[1]);
   }
)

Thanks,
James

Yes, that works and is simpler.
Is this ok then?
Bootstrapped and tested on aarch64-none-linux-gnu.
Thanks,
Kyrill

2016-12-16  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    * config/aarch64/aarch64.md: New define_split above insv<mode>.

2016-12-16  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    * gcc.target/aarch64/ubfx_lsr_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65ea04587442b0cab18b1e4652537524b82d5b86..5a40ee6abd5e123116aaaa478dced2207dd59478 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4340,6 +4340,26 @@ (define_insn "*<optab><mode>"
   [(set_attr "type" "bfx")]
 )
 
+;; When the bitposition and width add up to 32 we can use a W-reg LSR
+;; instruction taking advantage of the implicit zero-extension of the X-reg.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+	(zero_extract:DI (match_operand:DI 1 "register_operand")
+			 (match_operand 2
+			   "aarch64_simd_shift_imm_offset_di")
+			 (match_operand 3
+			   "aarch64_simd_shift_imm_di")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
+	     GET_MODE_BITSIZE (DImode) - 1)
+   && (INTVAL (operands[2]) + INTVAL (operands[3]))
+       == GET_MODE_BITSIZE (SImode)"
+  [(set (match_dup 0)
+	(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
+  {
+    operands[4] = gen_lowpart (SImode, operands[1]);
+  }
+)
+
 ;; Bitfield Insert (insv)
 (define_expand "insv<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..f6f72b074e1fc6bcb1976eee6c545e9781b4bed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that an X-reg UBFX can be simplified into a W-reg LSR.  */
+
+int
+f (unsigned long long x)
+{
+  x = (x >> 24) & 255;
+  return x + 1;
+}
+
+/* { dg-final { scan-assembler "lsr\tw" } } */
+/* { dg-final { scan-assembler-not "ubfx\tx" } } */

Reply via email to