Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

James Greenhalgh Fri, 16 May 2014 03:31:30 -0700

On Fri, Mar 28, 2014 at 03:39:53PM +0000, James Greenhalgh wrote:
> On Fri, Mar 28, 2014 at 03:09:22PM +0000, [email protected] wrote:
> > > On Mar 28, 2014, at 7:48 AM, James Greenhalgh <[email protected]> 
> > > wrote:
> > > On Fri, Mar 28, 2014 at 11:11:58AM +0000, [email protected] wrote:
> > >>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh 
> > >>> <[email protected]> wrote:
> > >>> There is no way to perform scalar addition in the vector register file,
> > >>> but with the RTX costs in place we start rewriting (x << 1) to (x + x)
> > >>> on almost all cores. The code which makes this decision has no idea 
> > >>> that we
> > >>> will end up doing this (it happens well before reload) and so we end up 
> > >>> with
> > >>> very ugly code generation in the case where addition was selected, but
> > >>> we are operating in vector registers.
> > >>> 
> > >>> This patch relies on the same gimmick we are already using to allow
> > >>> shifts on 32-bit scalars in the vector register file - Use a vector 32x2
> > >>> operation instead, knowing that we can safely ignore the top bits.
> > >>> 
> > >>> This restores some normality to scalar_shift_1.c, however the test
> > >>> that we generate a left shift by one is clearly bogus, so remove that.
> > >>> 
> > >>> This patch is pretty ugly, but it does generate superficially better
> > >>> looking code for this testcase.
> > >>> 
> > >>> Tested on aarch64-none-elf with no issues.
> > >>> 
> > >>> OK for stage 1?
> > >> 
> > >> It seems we should also discourage the neon alternatives as there might 
> > >> be
> > >> extra movement between the two register sets which we don't want.
> > > 
> > > I see your point, but we've tried to avoid doing that elsewhere in the
> > > AArch64 backend. Our argument has been that strictly speaking, it isn't 
> > > that
> > > the alternative is expensive, it is the movement between the register 
> > > sets. We
> > > do model that elsewhere, and the register allocator should already be 
> > > trying to
> > > avoid unneccesary moves between register classes.
> > > 
> > 
> > What about on a specific core where that alternative is expensive; that is
> > the vector instructions are worse than the scalar ones. How are we going to
> > handle this case?
> 
> Certainly not by discouraging the alternative for all cores. We would need
> a more nuanced approach which could be tuned on a per-core basis. Otherwise
> we are bluntly and inaccurately pessimizing those cases where we can cheaply
> perform the operation in the vector register file (e.g. we are cleaning up
> loose ends after a vector loop, we have spilled to the vector register
> file, etc.). The register preference mechanism feels the wrong place to
> catch this as it does not allow for that degree of per-core felxibility,
> an alternative is simply "disparaged slightly" (?, * in LRA) or
> "disparaged severely" (!).
> 
> I would think that we don't want to start polluting the machine description
> trying to hack around this as was done with the ARM backend's
> neon_for_64_bits/avoid_neon_for_64_bits.
> 
> How have other targets solved this issue?


Did you have any further thoughts on this? I've pushed the costs patches, so
we will start to see gcc.target/aarch64/scalar_shift_1.c failing without
this or an equivalent patch.

Othereise, *ping*

Thanks,
James

---
gcc/

2014-05-16  James Greenhalgh  <[email protected]>

        * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
        vector registers.

gcc/testsuite/

2014-05-16  James Greenhalgh  <[email protected]>

        * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.

> > > If those mechanisms are broken, we should fix them - in that case fixing
> > > this by discouraging valid alternatives would seem to be gaffer-taping 
> > > over the
> > > real problem.
> > > 
> > > Thanks,
> > > James
> > > 
> > >> 
> > >> Thanks,
> > >> Andrew
> > >> 
> > >>> 
> > >>> Thanks,
> > >>> James
> > >>> 
> > >>> ---
> > >>> gcc/
> > >>> 
> > >>> 2014-03-27  James Greenhalgh  <[email protected]>
> > >>> 
> > >>>   * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
> > >>>   vector registers.
> > >>> 
> > >>> gcc/testsuite/
> > >>> 2014-03-27  James Greenhalgh  <[email protected]>
> > >>> 
> > >>>   * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
> > >>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch>
> > >> 
> > 
> 

--------------1.8.3-rc0
Content-Type: text/x-patch; 
name="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; 
filename="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1068,16 +1068,17 @@ (define_expand "add<mode>3"
 
 (define_insn "*addsi3_aarch64"
   [(set
-    (match_operand:SI 0 "register_operand" "=rk,rk,rk")
+    (match_operand:SI 0 "register_operand" "=rk,rk,w,rk")
     (plus:SI
-     (match_operand:SI 1 "register_operand" "%rk,rk,rk")
-     (match_operand:SI 2 "aarch64_plus_operand" "I,r,J")))]
+     (match_operand:SI 1 "register_operand" "%rk,rk,w,rk")
+     (match_operand:SI 2 "aarch64_plus_operand" "I,r,w,J")))]
   ""
   "@
   add\\t%w0, %w1, %2
   add\\t%w0, %w1, %w2
+  add\\t%0.2s, %1.2s, %2.2s
   sub\\t%w0, %w1, #%n2"
-  [(set_attr "type" "alu_imm,alu_reg,alu_imm")]
+  [(set_attr "type" "alu_imm,alu_reg,neon_add,alu_imm")]
 )
 
 ;; zero_extend version of above
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
index 7cb17f8..826bafc 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
@@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b)
   return b;
 }
 /* { dg-final { scan-assembler "sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63" } } */
-/* { dg-final { scan-assembler "shl\td\[0-9\]+,\ d\[0-9\]+,\ 1" } } */
 
 Int32x1
 test_corners_sisd_si (Int32x1 b)
@@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b)
   return b;
 }
 /* { dg-final { scan-assembler "sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31" } } 
*/
-/* { dg-final { scan-assembler "shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1" } } */
 
 
 

--------------1.8.3-rc0--

Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

Reply via email to