On Fri, Mar 28, 2014 at 03:39:53PM +0000, James Greenhalgh wrote: > On Fri, Mar 28, 2014 at 03:09:22PM +0000, pins...@gmail.com wrote: > > > On Mar 28, 2014, at 7:48 AM, James Greenhalgh <james.greenha...@arm.com> > > > wrote: > > > On Fri, Mar 28, 2014 at 11:11:58AM +0000, pins...@gmail.com wrote: > > >>> On Mar 28, 2014, at 2:12 AM, James Greenhalgh > > >>> <james.greenha...@arm.com> wrote: > > >>> There is no way to perform scalar addition in the vector register file, > > >>> but with the RTX costs in place we start rewriting (x << 1) to (x + x) > > >>> on almost all cores. The code which makes this decision has no idea > > >>> that we > > >>> will end up doing this (it happens well before reload) and so we end up > > >>> with > > >>> very ugly code generation in the case where addition was selected, but > > >>> we are operating in vector registers. > > >>> > > >>> This patch relies on the same gimmick we are already using to allow > > >>> shifts on 32-bit scalars in the vector register file - Use a vector 32x2 > > >>> operation instead, knowing that we can safely ignore the top bits. > > >>> > > >>> This restores some normality to scalar_shift_1.c, however the test > > >>> that we generate a left shift by one is clearly bogus, so remove that. > > >>> > > >>> This patch is pretty ugly, but it does generate superficially better > > >>> looking code for this testcase. > > >>> > > >>> Tested on aarch64-none-elf with no issues. > > >>> > > >>> OK for stage 1? > > >> > > >> It seems we should also discourage the neon alternatives as there might > > >> be > > >> extra movement between the two register sets which we don't want. > > > > > > I see your point, but we've tried to avoid doing that elsewhere in the > > > AArch64 backend. Our argument has been that strictly speaking, it isn't > > > that > > > the alternative is expensive, it is the movement between the register > > > sets. We > > > do model that elsewhere, and the register allocator should already be > > > trying to > > > avoid unneccesary moves between register classes. > > > > > > > What about on a specific core where that alternative is expensive; that is > > the vector instructions are worse than the scalar ones. How are we going to > > handle this case? > > Certainly not by discouraging the alternative for all cores. We would need > a more nuanced approach which could be tuned on a per-core basis. Otherwise > we are bluntly and inaccurately pessimizing those cases where we can cheaply > perform the operation in the vector register file (e.g. we are cleaning up > loose ends after a vector loop, we have spilled to the vector register > file, etc.). The register preference mechanism feels the wrong place to > catch this as it does not allow for that degree of per-core felxibility, > an alternative is simply "disparaged slightly" (?, * in LRA) or > "disparaged severely" (!). > > I would think that we don't want to start polluting the machine description > trying to hack around this as was done with the ARM backend's > neon_for_64_bits/avoid_neon_for_64_bits. > > How have other targets solved this issue?
Did you have any further thoughts on this? I've pushed the costs patches, so we will start to see gcc.target/aarch64/scalar_shift_1.c failing without this or an equivalent patch. Othereise, *ping* Thanks, James --- gcc/ 2014-05-16 James Greenhalgh <james.greenha...@arm.com> * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-05-16 James Greenhalgh <james.greenha...@arm.com> * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. > > > If those mechanisms are broken, we should fix them - in that case fixing > > > this by discouraging valid alternatives would seem to be gaffer-taping > > > over the > > > real problem. > > > > > > Thanks, > > > James > > > > > >> > > >> Thanks, > > >> Andrew > > >> > > >>> > > >>> Thanks, > > >>> James > > >>> > > >>> --- > > >>> gcc/ > > >>> > > >>> 2014-03-27 James Greenhalgh <james.greenha...@arm.com> > > >>> > > >>> * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in > > >>> vector registers. > > >>> > > >>> gcc/testsuite/ > > >>> 2014-03-27 James Greenhalgh <james.greenha...@arm.com> > > >>> > > >>> * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. > > >>> <0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch> > > >> > > > --------------1.8.3-rc0 Content-Type: text/x-patch; name="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch" Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename="0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch" diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1068,16 +1068,17 @@ (define_expand "add<mode>3" (define_insn "*addsi3_aarch64" [(set - (match_operand:SI 0 "register_operand" "=rk,rk,rk") + (match_operand:SI 0 "register_operand" "=rk,rk,w,rk") (plus:SI - (match_operand:SI 1 "register_operand" "%rk,rk,rk") - (match_operand:SI 2 "aarch64_plus_operand" "I,r,J")))] + (match_operand:SI 1 "register_operand" "%rk,rk,w,rk") + (match_operand:SI 2 "aarch64_plus_operand" "I,r,w,J")))] "" "@ add\\t%w0, %w1, %2 add\\t%w0, %w1, %w2 + add\\t%0.2s, %1.2s, %2.2s sub\\t%w0, %w1, #%n2" - [(set_attr "type" "alu_imm,alu_reg,alu_imm")] + [(set_attr "type" "alu_imm,alu_reg,neon_add,alu_imm")] ) ;; zero_extend version of above diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c index 7cb17f8..826bafc 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c @@ -193,7 +193,6 @@ test_corners_sisd_di (Int64x1 b) return b; } /* { dg-final { scan-assembler "sshr\td\[0-9\]+,\ d\[0-9\]+,\ 63" } } */ -/* { dg-final { scan-assembler "shl\td\[0-9\]+,\ d\[0-9\]+,\ 1" } } */ Int32x1 test_corners_sisd_si (Int32x1 b) @@ -207,7 +206,6 @@ test_corners_sisd_si (Int32x1 b) return b; } /* { dg-final { scan-assembler "sshr\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 31" } } */ -/* { dg-final { scan-assembler "shl\tv\[0-9\]+\.2s,\ v\[0-9\]+\.2s,\ 1" } } */ --------------1.8.3-rc0--