On Fri, May 16, 2014 at 11:30:38AM +0100, James Greenhalgh wrote: > On Fri, Mar 28, 2014 at 03:39:53PM +0000, James Greenhalgh wrote: > > On Fri, Mar 28, 2014 at 03:09:22PM +0000, pins...@gmail.com wrote: > > > > On Mar 28, 2014, at 7:48 AM, James Greenhalgh > > > > <james.greenha...@arm.com> wrote: > > > > On Fri, Mar 28, 2014 at 11:11:58AM +0000, pins...@gmail.com wrote: > > > >>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh > > > >>> <james.greenha...@arm.com> wrote: > > > >>> There is no way to perform scalar addition in the vector register > > > >>> file, > > > >>> but with the RTX costs in place we start rewriting (x << 1) to (x + x) > > > >>> on almost all cores. The code which makes this decision has no idea > > > >>> that we > > > >>> will end up doing this (it happens well before reload) and so we end > > > >>> up with > > > >>> very ugly code generation in the case where addition was selected, but > > > >>> we are operating in vector registers. > > > >>> > > > >>> This patch relies on the same gimmick we are already using to allow > > > >>> shifts on 32-bit scalars in the vector register file - Use a vector > > > >>> 32x2 > > > >>> operation instead, knowing that we can safely ignore the top bits. > > > >>> > > > >>> This restores some normality to scalar_shift_1.c, however the test > > > >>> that we generate a left shift by one is clearly bogus, so remove that. > > > >>> > > > >>> This patch is pretty ugly, but it does generate superficially better > > > >>> looking code for this testcase. > > > >>> > > > >>> Tested on aarch64-none-elf with no issues. > > > >>> > > > >>> OK for stage 1? > > > >> > > > >> It seems we should also discourage the neon alternatives as there > > > >> might be > > > >> extra movement between the two register sets which we don't want. > > > > > > > > I see your point, but we've tried to avoid doing that elsewhere in the > > > > AArch64 backend. Our argument has been that strictly speaking, it isn't > > > > that > > > > the alternative is expensive, it is the movement between the register > > > > sets. We > > > > do model that elsewhere, and the register allocator should already be > > > > trying to > > > > avoid unneccesary moves between register classes. > > > > > > > > > > What about on a specific core where that alternative is expensive; that is > > > the vector instructions are worse than the scalar ones. How are we going > > > to > > > handle this case? > > > > Certainly not by discouraging the alternative for all cores. We would need > > a more nuanced approach which could be tuned on a per-core basis. Otherwise > > we are bluntly and inaccurately pessimizing those cases where we can cheaply > > perform the operation in the vector register file (e.g. we are cleaning up > > loose ends after a vector loop, we have spilled to the vector register > > file, etc.). The register preference mechanism feels the wrong place to > > catch this as it does not allow for that degree of per-core felxibility, > > an alternative is simply "disparaged slightly" (?, * in LRA) or > > "disparaged severely" (!). > > > > I would think that we don't want to start polluting the machine description > > trying to hack around this as was done with the ARM backend's > > neon_for_64_bits/avoid_neon_for_64_bits. > > > > How have other targets solved this issue? > > Did you have any further thoughts on this? I've pushed the costs patches, so > we will start to see gcc.target/aarch64/scalar_shift_1.c failing without > this or an equivalent patch.
This has been sitting waiting for comment for a while now. If we do need a mechanism to describe individual costs for alternatives, it will need applied to all the existing uses in aarch64.md/aarch64-simd.md. I think solving that problem (if we need to) is a seperate patch, and shouldn't prevent this one from going in. *pingx2*. Thanks, James > --- > gcc/ > > 2014-05-16 James Greenhalgh <james.greenha...@arm.com> > > * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in > vector registers. > > gcc/testsuite/ > > 2014-05-16 James Greenhalgh <james.greenha...@arm.com> > > * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. > > > > > If those mechanisms are broken, we should fix them - in that case fixing > > > > this by discouraging valid alternatives would seem to be gaffer-taping > > > > over the > > > > real problem. > > > > > > > > Thanks, > > > > James > > > > > > > >> > > > >> Thanks, > > > >> Andrew > > > >> > > > >>> > > > >>> Thanks, > > > >>> James > > > >>> > > > >>> --- > > > >>> gcc/ > > > >>> > > > >>> 2014-03-27 James Greenhalgh <james.greenha...@arm.com> > > > >>> > > > >>> * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in > > > >>> vector registers. > > > >>> > > > >>> gcc/testsuite/ > > > >>> 2014-03-27 James Greenhalgh <james.greenha...@arm.com> > > > >>> > > > >>> * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. > > > >>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch> > > > >> > > > > > > > --------------1.8.3-rc0 > Content-Type: text/x-patch; > name="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch" > Content-Transfer-Encoding: 8bit > Content-Disposition: attachment; > filename="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch" > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > 266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4 > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -1068,16 +1068,17 @@ (define_expand "add<mode>3" > > (define_insn "*addsi3_aarch64" > [(set > - (match_operand:SI 0 "register_operand" "=rk,rk,rk") > + (match_operand:SI 0 "register_operand" "=rk,rk,w,rk") > (plus:SI > - (match_operand:SI 1 "register_operand" "%rk,rk,rk") > - (match_operand:SI 2 "aarch64_plus_operand" "I,r,J")))] > + (match_operand:SI 1 "register_operand" "%rk,rk,w,rk") > + (match_operand:SI 2 "aarch64_plus_operand" "I,r,w,J")))] > "" > "@ > add\\t%w0, %w1, %2 > add\\t%w0, %w1, %w2 > + add\\t%0.2s, %1.2s, %2.2s > sub\\t%w0, %w1, #%n2" > - [(set_attr "type" "alu_imm,alu_reg,alu_imm")] > + [(set_attr "type" "alu_imm,alu_reg,neon_add,alu_imm")] > ) > > ;; zero_extend version of above > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > index 7cb17f8..826bafc 100644 > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > @@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b) > return b; > } > /* { dg-final { scan-assembler "sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63" } } */ > -/* { dg-final { scan-assembler "shl\td\[0-9\]+,\ d\[0-9\]+,\ 1" } } */ > > Int32x1 > test_corners_sisd_si (Int32x1 b) > @@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b) > return b; > } > /* { dg-final { scan-assembler "sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31" } > } */ > -/* { dg-final { scan-assembler "shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1" } } > */ > > > > > --------------1.8.3-rc0-- > > >