Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

James Greenhalgh Thu, 19 Jun 2014 06:13:28 -0700

On Fri, May 16, 2014 at 11:30:38AM +0100, James Greenhalgh wrote:
> On Fri, Mar 28, 2014 at 03:39:53PM +0000, James Greenhalgh wrote:
> > On Fri, Mar 28, 2014 at 03:09:22PM +0000, pins...@gmail.com wrote:
> > > > On Mar 28, 2014, at 7:48 AM, James Greenhalgh 
> > > > <james.greenha...@arm.com> wrote:
> > > > On Fri, Mar 28, 2014 at 11:11:58AM +0000, pins...@gmail.com wrote:
> > > >>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh 
> > > >>> <james.greenha...@arm.com> wrote:
> > > >>> There is no way to perform scalar addition in the vector register 
> > > >>> file,
> > > >>> but with the RTX costs in place we start rewriting (x << 1) to (x + x)
> > > >>> on almost all cores. The code which makes this decision has no idea 
> > > >>> that we
> > > >>> will end up doing this (it happens well before reload) and so we end 
> > > >>> up with
> > > >>> very ugly code generation in the case where addition was selected, but
> > > >>> we are operating in vector registers.
> > > >>> 
> > > >>> This patch relies on the same gimmick we are already using to allow
> > > >>> shifts on 32-bit scalars in the vector register file - Use a vector 
> > > >>> 32x2
> > > >>> operation instead, knowing that we can safely ignore the top bits.
> > > >>> 
> > > >>> This restores some normality to scalar_shift_1.c, however the test
> > > >>> that we generate a left shift by one is clearly bogus, so remove that.
> > > >>> 
> > > >>> This patch is pretty ugly, but it does generate superficially better
> > > >>> looking code for this testcase.
> > > >>> 
> > > >>> Tested on aarch64-none-elf with no issues.
> > > >>> 
> > > >>> OK for stage 1?
> > > >> 
> > > >> It seems we should also discourage the neon alternatives as there 
> > > >> might be
> > > >> extra movement between the two register sets which we don't want.
> > > > 
> > > > I see your point, but we've tried to avoid doing that elsewhere in the
> > > > AArch64 backend. Our argument has been that strictly speaking, it isn't 
> > > > that
> > > > the alternative is expensive, it is the movement between the register 
> > > > sets. We
> > > > do model that elsewhere, and the register allocator should already be 
> > > > trying to
> > > > avoid unneccesary moves between register classes.
> > > > 
> > > 
> > > What about on a specific core where that alternative is expensive; that is
> > > the vector instructions are worse than the scalar ones. How are we going 
> > > to
> > > handle this case?
> > 
> > Certainly not by discouraging the alternative for all cores. We would need
> > a more nuanced approach which could be tuned on a per-core basis. Otherwise
> > we are bluntly and inaccurately pessimizing those cases where we can cheaply
> > perform the operation in the vector register file (e.g. we are cleaning up
> > loose ends after a vector loop, we have spilled to the vector register
> > file, etc.). The register preference mechanism feels the wrong place to
> > catch this as it does not allow for that degree of per-core felxibility,
> > an alternative is simply "disparaged slightly" (?, * in LRA) or
> > "disparaged severely" (!).
> > 
> > I would think that we don't want to start polluting the machine description
> > trying to hack around this as was done with the ARM backend's
> > neon_for_64_bits/avoid_neon_for_64_bits.
> > 
> > How have other targets solved this issue?
> 
> Did you have any further thoughts on this? I've pushed the costs patches, so
> we will start to see gcc.target/aarch64/scalar_shift_1.c failing without
> this or an equivalent patch.


This has been sitting waiting for comment for a while now. If we do need a
mechanism to describe individual costs for alternatives, it will need
applied to all the existing uses in aarch64.md/aarch64-simd.md. I think
solving that problem (if we need to) is a seperate patch, and shouldn't
prevent this one from going in.

*pingx2*.

Thanks,
James

> ---
> gcc/
> 
> 2014-05-16  James Greenhalgh  <james.greenha...@arm.com>
> 
>       * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
>       vector registers.
> 
> gcc/testsuite/
> 
> 2014-05-16  James Greenhalgh  <james.greenha...@arm.com>
> 
>       * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
> 
> > > > If those mechanisms are broken, we should fix them - in that case fixing
> > > > this by discouraging valid alternatives would seem to be gaffer-taping 
> > > > over the
> > > > real problem.
> > > > 
> > > > Thanks,
> > > > James
> > > > 
> > > >> 
> > > >> Thanks,
> > > >> Andrew
> > > >> 
> > > >>> 
> > > >>> Thanks,
> > > >>> James
> > > >>> 
> > > >>> ---
> > > >>> gcc/
> > > >>> 
> > > >>> 2014-03-27  James Greenhalgh  <james.greenha...@arm.com>
> > > >>> 
> > > >>>   * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in
> > > >>>   vector registers.
> > > >>> 
> > > >>> gcc/testsuite/
> > > >>> 2014-03-27  James Greenhalgh  <james.greenha...@arm.com>
> > > >>> 
> > > >>>   * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler.
> > > >>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch>
> > > >> 
> > > 
> > 
> 
> --------------1.8.3-rc0
> Content-Type: text/x-patch; 
> name="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"
> Content-Transfer-Encoding: 8bit
> Content-Disposition: attachment; 
> filename="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch"
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1068,16 +1068,17 @@ (define_expand "add<mode>3"
>  
>  (define_insn "*addsi3_aarch64"
>    [(set
> -    (match_operand:SI 0 "register_operand" "=rk,rk,rk")
> +    (match_operand:SI 0 "register_operand" "=rk,rk,w,rk")
>      (plus:SI
> -     (match_operand:SI 1 "register_operand" "%rk,rk,rk")
> -     (match_operand:SI 2 "aarch64_plus_operand" "I,r,J")))]
> +     (match_operand:SI 1 "register_operand" "%rk,rk,w,rk")
> +     (match_operand:SI 2 "aarch64_plus_operand" "I,r,w,J")))]
>    ""
>    "@
>    add\\t%w0, %w1, %2
>    add\\t%w0, %w1, %w2
> +  add\\t%0.2s, %1.2s, %2.2s
>    sub\\t%w0, %w1, #%n2"
> -  [(set_attr "type" "alu_imm,alu_reg,alu_imm")]
> +  [(set_attr "type" "alu_imm,alu_reg,neon_add,alu_imm")]
>  )
>  
>  ;; zero_extend version of above
> diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> index 7cb17f8..826bafc 100644
> --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> @@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b)
>    return b;
>  }
>  /* { dg-final { scan-assembler "sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63" } } */
> -/* { dg-final { scan-assembler "shl\td\[0-9\]+,\ d\[0-9\]+,\ 1" } } */
>  
>  Int32x1
>  test_corners_sisd_si (Int32x1 b)
> @@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b)
>    return b;
>  }
>  /* { dg-final { scan-assembler "sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31" } 
> } */
> -/* { dg-final { scan-assembler "shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1" } } 
> */
>  
>  
>  
> 
> --------------1.8.3-rc0--
> 
> 
>

Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.

Reply via email to